|
| 1 | +## Training Pipeline |
| 2 | + |
| 3 | +This guide explains how to train PVNet using the open-data-pvnet pipeline. It covers: |
| 4 | +- Configuration structure and best practices |
| 5 | +- The two supported training workflows (with clear recommendations) |
| 6 | +- Common pitfalls and how to resolve them |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## 1. Configuration Basics |
| 11 | + |
| 12 | +All configuration files live in a structured directory: |
| 13 | + |
| 14 | +``` |
| 15 | +open-data-pvnet/ |
| 16 | +└── src/ |
| 17 | + └── open_data_pvnet/ |
| 18 | + └── configs/ |
| 19 | + └── PVNet_configs/ |
| 20 | +``` |
| 21 | + |
| 22 | +The main configuration file is `config.yaml`, which tells the system which sub-configurations to use. Think of it as the master control panel that connects everything together. |
| 23 | + |
| 24 | +--- |
| 25 | + |
| 26 | +## 2. Understanding the Key Configuration Parts |
| 27 | + |
| 28 | +### Trainer Configuration |
| 29 | +Controls how your model trains: |
| 30 | +- GPU/CPU usage |
| 31 | +- Training duration |
| 32 | +- Precision settings |
| 33 | + |
| 34 | +### Model Configuration |
| 35 | +Defines your model's architecture: |
| 36 | +- Which encoders to use (GFS, satellite, etc.) |
| 37 | +- Forecast horizon |
| 38 | +- Optimizer settings |
| 39 | + |
| 40 | +### Data Configuration (Most Important!) |
| 41 | +This determines how your data is loaded. PVNet supports two approaches: |
| 42 | +- **Streamed batches**: Directly from Zarr files |
| 43 | +- **Premade batches**: From pre-generated samples |
| 44 | + |
| 45 | +You must choose only one approach at a time - mixing them will cause errors. |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## 3. Two Ways to Train PVNet |
| 50 | + |
| 51 | +### Method 1: Streamed Batches (Direct Zarr Loading) |
| 52 | + |
| 53 | +This approach loads data directly from Zarr files during training. |
| 54 | + |
| 55 | +**When to use it:** |
| 56 | +- You have sufficient disk space and bandwidth |
| 57 | +- You don't want to pre-generate samples |
| 58 | + |
| 59 | +**How to set it up:** |
| 60 | +In your `config.yaml`, ensure you have: |
| 61 | +```yaml |
| 62 | +- datamodule: streamed_batches.yaml |
| 63 | +``` |
| 64 | +
|
| 65 | +**Important settings:** |
| 66 | +```yaml |
| 67 | +_target_: pvnet.data.DataModule |
| 68 | +configuration: /ABSOLUTE/PATH/example_configuration.yaml |
| 69 | +batch_size: 8 |
| 70 | +num_workers: 2 |
| 71 | +prefetch_factor: 2 |
| 72 | +train_period: |
| 73 | + - null |
| 74 | + - "2023-06-30" |
| 75 | +val_period: |
| 76 | + - "2023-07-01" |
| 77 | + - "2023-12-31" |
| 78 | +``` |
| 79 | +
|
| 80 | +**What to avoid:** |
| 81 | +Don't include `sample_output_dir`, `num_train_samples`, or `num_val_samples` in your streamed configuration, as these will cause errors. |
| 82 | + |
| 83 | +**To start training:** |
| 84 | +```bash |
| 85 | +python run.py experiment=example_simple |
| 86 | +``` |
| 87 | + |
| 88 | +### Method 2: Premade Batches (Recommended for Beginners) |
| 89 | + |
| 90 | +This approach uses pre-generated samples, making training more stable and reproducible. |
| 91 | + |
| 92 | +**When to use it:** |
| 93 | +- You want consistent, reproducible results |
| 94 | +- You want faster iteration during development |
| 95 | +- You've encountered issues with Zarr or storage |
| 96 | + |
| 97 | +**Step 1: Generate samples** |
| 98 | +Navigate to `open-data-pvnet/src/open_data_pvnet/scripts` and run: |
| 99 | +```bash |
| 100 | +python save_samples.py \ |
| 101 | + +datamodule.sample_output_dir="GFS_samples" \ |
| 102 | + +datamodule.num_train_samples=10 \ |
| 103 | + +datamodule.num_val_samples=2 \ |
| 104 | + datamodule.num_workers=2 |
| 105 | +``` |
| 106 | + |
| 107 | +This creates a directory with your samples: |
| 108 | +``` |
| 109 | +scripts/ |
| 110 | +└── GFS_samples/ |
| 111 | + ├── data_configuration.yaml |
| 112 | + ├── train/ |
| 113 | + └── val/ |
| 114 | +``` |
| 115 | + |
| 116 | +**Step 2: Switch to premade batches** |
| 117 | +In your `config.yaml`, change to: |
| 118 | +```yaml |
| 119 | +- datamodule: premade_batches.yaml |
| 120 | +``` |
| 121 | + |
| 122 | +**Step 3: Configure the premade batches** |
| 123 | +```yaml |
| 124 | +_target_: pvnet.data.DataModule |
| 125 | +sample_output_dir: /ABSOLUTE/PATH/TO/GFS_samples |
| 126 | +batch_size: 8 |
| 127 | +num_workers: 2 |
| 128 | +prefetch_factor: 2 |
| 129 | +``` |
| 130 | + |
| 131 | +**Important:** Always use absolute paths! Hydra changes the working directory at runtime, so relative paths will break. |
| 132 | + |
| 133 | +**Step 4: Train** |
| 134 | +```bash |
| 135 | +python run.py experiment=example_simple |
| 136 | +``` |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +## 4. Data Configuration Best Practices |
| 141 | + |
| 142 | +When setting up your data configuration (`example_configuration.yaml`), we strongly recommend: |
| 143 | + |
| 144 | +1. Download data locally |
| 145 | +2. Point to local Zarr paths |
| 146 | +3. Set `public: false` for local data |
| 147 | + |
| 148 | +Example configuration: |
| 149 | +```yaml |
| 150 | +gsp: |
| 151 | + zarr_path: C:/data/gsp/combined_2023_gsp.zarr |
| 152 | + public: false |
| 153 | +
|
| 154 | +nwp: |
| 155 | + gfs: |
| 156 | + zarr_path: C:/data/nwp/nwp_gfs.zarr |
| 157 | + public: false |
| 158 | +``` |
| 159 | + |
| 160 | +### Important Note About Local Data |
| 161 | + |
| 162 | +If you're using local data, make sure to set `public: false` in your configuration. When `public: true` is set with a local path, you'll encounter this error: |
| 163 | + |
| 164 | +``` |
| 165 | +ValueError: storage_options passed with non-fsspec path |
| 166 | +``` |
| 167 | + |
| 168 | +For example, if your configuration looks like this: |
| 169 | +```yaml |
| 170 | +gsp: |
| 171 | + zarr_path: "s3://ocf-open-data-pvnet/data/uk/pvlive/v2/combined_2023_gsp.zarr" |
| 172 | + # ... other settings ... |
| 173 | + public: True |
| 174 | +``` |
| 175 | + |
| 176 | +And you're trying to use local data, change it to: |
| 177 | +```yaml |
| 178 | +gsp: |
| 179 | + zarr_path: "/path/to/your/local/data/combined_2023_gsp.zarr" |
| 180 | + # ... other settings ... |
| 181 | + public: False |
| 182 | +``` |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## 5. Common Hydra Issues |
| 187 | + |
| 188 | +### Hydra Override Error |
| 189 | + |
| 190 | +If you encounter an override error when using `experiment_simple`, it might be due to conflicting override declarations. The `experiment_simple` configuration includes these overrides: |
| 191 | + |
| 192 | +```yaml |
| 193 | +defaults: |
| 194 | + - override /trainer: default.yaml |
| 195 | + - override /model: multimodal.yaml |
| 196 | + - override /datamodule: premade_samples.yaml |
| 197 | + - override /callbacks: default.yaml |
| 198 | + - override /logger: wandb.yaml |
| 199 | + - override /hydra: default.yaml |
| 200 | +``` |
| 201 | + |
| 202 | +If you see errors like "Could not override 'hydra'", you may need to temporarily comment out the hydra override in your config file: |
| 203 | + |
| 204 | +```yaml |
| 205 | +# - override /hydra: default.yaml |
| 206 | +``` |
| 207 | + |
| 208 | +### Working Directory Changes |
| 209 | + |
| 210 | +Remember that Hydra runs experiments inside timestamped directories (e.g., `outputs/YYYY-MM-DD/HH-MM-SS/`). This is why absolute paths are essential - relative paths will break when the working directory changes. |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## 6. Quick Comparison: Streamed vs. Premade |
| 215 | + |
| 216 | +| Feature | Streamed | Premade | |
| 217 | +| -------------------------- | -------- | ------- | |
| 218 | +| Reads Zarr at runtime | Yes | No | |
| 219 | +| Needs pre-generated samples| No | Yes | |
| 220 | +| Sensitive to storage flags | Yes | No | |
| 221 | +| Recommended for beginners | No | Yes | |
| 222 | + |
| 223 | +--- |
| 224 | + |
| 225 | +## 7. External Requirements |
| 226 | + |
| 227 | +### Google Cloud CLI |
| 228 | + |
| 229 | +Even if you're using local data, some metadata utilities require the Google Cloud CLI. Install it from: |
| 230 | +https://cloud.google.com/sdk/docs/install |
| 231 | + |
| 232 | +Then authenticate: |
| 233 | +```bash |
| 234 | +gcloud auth application-default login |
| 235 | +``` |
0 commit comments