Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/_static/metadata_prior_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
139 changes: 138 additions & 1 deletion docs/user_guide/03_cropmodels.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,147 @@ While that approach is certainly valid, there are a few key benefits to using Cr
- **Simpler and Extendable**: CropModels decouple detection and classification workflows, allowing separate handling of challenges like class imbalance and incomplete labels, without reducing the quality of the detections. Two-stage object detection models can be finicky with similar classes and often require expertise in managing learning rates.
- **New Data and Multi-sensor Learning**: In many applications, the data needed for detection and classification may differ. The CropModel concept provides an extendable piece that allows for advanced pipelines.

(spatial-temporal-metadata)=
## Spatial-Temporal Metadata

In biodiversity monitoring, species distributions vary by location and season. A bird common in Florida may be rare in Alaska, and migratory species shift seasonally. The CropModel supports an optional spatial-temporal metadata embedding that provides location and date context alongside image features to improve classification.

The metadata signal is intentionally "gentle" — it contributes only ~1.5% of the feature vector (32 dimensions vs. 2048 image features). This means the model still classifies primarily from visual appearance but can use location/season as a soft prior. When metadata is not provided at inference time, the model gracefully degrades to image-only classification.

### How It Works

When `use_metadata=True`, the CropModel:

1. Encodes `(lat, lon, day_of_year)` using sinusoidal features (smooth, periodic representation)
2. Projects the 6 sinusoidal features through a small MLP to a 32-dim embedding
3. Concatenates this with the 2048-dim ResNet image features
4. Classifies from the combined 2080-dim vector

### Inference with Metadata

Pass a `metadata` dict to `predict_tile`:

```python
from deepforest import main
from deepforest.model import CropModel

m = main.deepforest()
m.create_trainer()

crop_model = CropModel(config_args={"use_metadata": True})
crop_model.load_from_disk(train_dir="path/to/train", val_dir="path/to/val",
metadata_csv="metadata.csv")
crop_model.create_trainer(max_epochs=10)
crop_model.trainer.fit(crop_model)

result = m.predict_tile(
path="image.tif",
crop_model=crop_model,
metadata={"lat": 35.2, "lon": -120.4, "date": "2024-06-15"}
)
```

All detected crops in the tile share the same metadata. If `metadata` is omitted, the model falls back to image-only classification.

### Training with Metadata

Training requires a CSV sidecar file that maps each crop image filename to its spatial-temporal metadata:

```text
filename,lat,lon,date
bird_001.png,35.2,-120.4,2024-06-15
bird_002.png,35.2,-120.4,2024-06-15
mammal_001.png,40.1,-105.3,2024-07-20
```

- `filename` matches the image basename inside the ImageFolder class directories
- `date` is an ISO format string, converted to day-of-year internally
- One CSV covers both train and val sets (filenames are unique)

The existing ImageFolder directory structure is unchanged:

```
train/
Bird/
bird_001.png
bird_002.png
Mammal/
mammal_001.png
```

Pass the CSV when loading data:

```python
from deepforest.model import CropModel

crop_model = CropModel(config_args={"use_metadata": True})
crop_model.load_from_disk(
train_dir="path/to/train",
val_dir="path/to/val",
metadata_csv="metadata.csv"
)
crop_model.create_trainer(max_epochs=10)
crop_model.trainer.fit(crop_model)
```

### Configuration

The metadata embedding is controlled by three config parameters:

```python
crop_model = CropModel(config_args={
"use_metadata": True, # Enable metadata fusion (default: False)
"metadata_dim": 32, # Embedding dimension (default: 32)
"metadata_dropout": 0.5, # Dropout on metadata path (default: 0.5)
})
```

Or in `config.yaml`:

```yaml
cropmodel:
use_metadata: True
metadata_dim: 32
metadata_dropout: 0.5
```

### Visualizing Metadata Priors

After training a metadata-enabled CropModel, it can be useful to inspect the
spatial-temporal branch by itself. The
{download}`metadata prior visualization script <examples/visualize_metadata_priors.py>`
loads a checkpoint, evaluates a lat/lon grid for one or more dates, and writes:

- A CSV with metadata-only logits, probabilities, and relative scores
- PNG maps for selected species and dates
- GeoTIFF rasters for GIS workflows

For example:

```bash
uv run python docs/user_guide/examples/visualize_metadata_priors.py \
--checkpoint path/to/metadata_cropmodel.ckpt \
--species "Morus bassanus" \
--dates 2024-04-15 \
--bounds -98 18 -55 48 \
--cell-degrees 1.0 \
--output-dir outputs/metadata_prior_maps
```

The map below shows a relative metadata prior for Northern Gannet
(`Morus bassanus`) on April 15, 2024. It reflects the learned metadata branch,
not image evidence. Basemap tiles are optional; install `contextily` to include
them or pass `--no-basemap` to plot only the score raster.

```{image} ../_static/metadata_prior_example.png
:alt: Metadata prior map for Morus bassanus over the western Atlantic
:width: 650px
```

## Considerations

- **Efficiency**: Using a CropModel will be slower, as for each detection, the sensor data needs to be cropped and passed to the detector. This is less efficient than using a combined classification/detection system like multi-class detection models. While modern GPUs mitigate this to some extent, it is still something to be mindful of.
- **Lack of Spatial Awareness**: The model knows only about the pixels inside the crop and cannot use features outside the bounding box. This lack of spatial awareness can be a major limitation. It is possible, but untested, that multi-class detection models might perform better in such tasks. A box attention mechanism, like in [this paper](https://arxiv.org/abs/2111.13087), could be a better approach.
- **Lack of Spatial Awareness**: The model knows only about the pixels inside the crop and cannot use features outside the bounding box. This lack of spatial awareness can be a major limitation. It is possible, but untested, that multi-class detection models might perform better in such tasks. A box attention mechanism, like in [this paper](https://arxiv.org/abs/2111.13087), could be a better approach. See the {ref}`spatial-temporal-metadata` section for an optional way to incorporate location and season information.

## Single Crop Model

Expand Down
12 changes: 12 additions & 0 deletions docs/user_guide/09_configuration_file.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,3 +319,15 @@ crop_model = CropModel()
# Or use custom resize dimensions
crop_model = CropModel(config_args={"resize": [300, 300]})
```

### use_metadata

Boolean flag to enable spatial-temporal metadata fusion. When `True`, the model accepts `(lat, lon, date)` alongside image crops and learns a small embedding that is concatenated with image features. Default is `False`. See {ref}`spatial-temporal-metadata` for usage details.

### metadata_dim

Dimension of the metadata embedding vector. A smaller value makes the metadata signal more gentle relative to the 2048-dim image features. Default is `32`.

### metadata_dropout

Dropout rate applied to the metadata embedding path. Higher values reduce the model's reliance on location/date information. Default is `0.5`.
Loading
Loading