Skip to content

feat: Add script to fetch US solar data from EIA (Issue #109)#127

Merged
peterdudfield merged 3 commits intoopenclimatefix:mainfrom
mahendra-918:feature/add-eia-data-script
Feb 16, 2026
Merged

feat: Add script to fetch US solar data from EIA (Issue #109)#127
peterdudfield merged 3 commits intoopenclimatefix:mainfrom
mahendra-918:feature/add-eia-data-script

Conversation

@mahendra-918
Copy link
Copy Markdown
Contributor

Description

This PR adds a new script to fetch solar generation data from the US Energy Information Administration (EIA) Open Data API v2. This is the foundational step required to extend PVNet models to the United States (supporting [META] Issue #103).

The new EIAData class allows users to fetch hourly electricity generation data by fuel type (e.g., Solar) for major US grid operators (RTOs).

Fixes #109

How Has This Been Tested?

I have tested this change in two ways:

  1. Manual Verification: verified the script locally using a valid EIA API Key. Successfully fetched hourly solar generation data for CAISO for a 24-hour period.
  2. Automated Tests: Added tests/test_eia.py which uses unittest.mock to verify the API request logic, parameter formatting, and error handling without requiring a real API connection.
Image 01-02-26 at 2 53 AM Image 01-02-26 at 2 53 AM (1)
  • Yes

If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?

  • Yes (Verified the returned DataFrame structure and values)

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation (Added docstrings to the class)
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@mahendra-918
Copy link
Copy Markdown
Contributor Author

mahendra-918 commented Jan 31, 2026

Hi @peterdudfield, @jcamier , @siddharth7113 I’ve implemented the EIA data collection script for Issue #109.
Ready for review

@siddharth7113 siddharth7113 self-requested a review February 1, 2026 18:37
Copy link
Copy Markdown
Contributor

@siddharth7113 siddharth7113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EIA returns both daily and hourly data, if you could modify the script to get the hourly data or specify which option , it would be better, it also has does EIA region wise and US-48 for our particular use case I would recommend to get US-48 data only and not other region otherwise this could leads to duplicated data.

Comment thread src/open_data_pvnet/scripts/fetch_eia_data.py
end_date: End date string
data_cols: List of data columns to retrieve
facets: Dictionary of facets to filter by
offset: Pagination offset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is offset, and pagination here? Why do we need it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need them for large datasets because the API paginates its responses. offset allows us to request subsequent "pages" of data when the total number of records exceeds the API's single-request limit (usually 5,000).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt we check that when we hit the API, and then pull more data if we need to?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I'll update the script to automatically handle pagination so it fetches all available data for the requested period without needing manual offset management.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I updated
get_data
to automatically loop and fetch all available pages until the API returns less than the requested length. This way, users don't need to manually manage offsets. I also added a
test_get_data_pagination
case to verify it.

@mahendra-918
Copy link
Copy Markdown
Contributor Author

mahendra-918 commented Feb 2, 2026

Thanks @siddharth7113 for the feedback! I've addressed the points:
1.Hourly & US-48 Defaults: The get_data
method now defaults to frequency='hourly' and region='US48'. When region='US48' is set, it automatically adds the respondent=['US48'] facet to filter out duplicate regional data.
2.Zarr/Xarray Support: I added a get_dataset()
method that processes the raw DataFrame into an xarray.Dataset with a datetime_gmt index. This output is compatible with ocf-data-sampler and can be easily saved to Zarr.
3.Tests: Added new tests to verify the region filtering and the dataset conversion logic.

Screenshot 2026-02-02 at 3 42 36 PM

@peterdudfield
Copy link
Copy Markdown
Contributor

Thanks @mahendra-918 , you happy this is merged?

@mahendra-918
Copy link
Copy Markdown
Contributor Author

Thanks @peterdudfield Yes, I’m happy with the changes. Everything is ready from my side for the merge

@peterdudfield peterdudfield merged commit 126b309 into openclimatefix:main Feb 16, 2026
2 checks passed
@peterdudfield
Copy link
Copy Markdown
Contributor

Thanks @mahendra-918 for all this. @all-contributors please add @mahendra-918 for code

@allcontributors
Copy link
Copy Markdown
Contributor

@peterdudfield

I've put up a pull request to add @mahendra-918! 🎉

@mahendra-918
Copy link
Copy Markdown
Contributor Author

Thanks for the support and the reviews, @peterdudfield Happy to have this merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Creating scripts to collect EIA data

3 participants