Issue gfs_sample.ipynb #96#98
Closed
Joy-chakraborty-23 wants to merge 11 commits intoopenclimatefix:mainfrom
Closed
Issue gfs_sample.ipynb #96#98Joy-chakraborty-23 wants to merge 11 commits intoopenclimatefix:mainfrom
Joy-chakraborty-23 wants to merge 11 commits intoopenclimatefix:mainfrom
Conversation
# Test Suite Improvements for Open Data PVnet
This document outlines the key enhancements made to the existing test suite in **tests/test_utils.py**, focusing on reliability, determinism, and maintainability while preserving original functionality.
---
## 1. Environment Loader Tests
**Original Issue:**
- Relied on patching module-level constants and did not verify that environment variables were actually populated.
- No isolation: existing environment variables could interfere with tests.
**Improvements:**
1. **Isolation with `monkeypatch`:**
- Use `monkeypatch.delenv` to ensure tested variables are unset before loading.
- Set `PROJECT_BASE` via `monkeypatch.setenv` rather than patching the module constant, simulating real environment behavior.
2. **Explicit Assertions:**
- After loading, assert that each variable appears in `os.environ` with the expected value.
3. **Error Handling Verification:**
- Confirm a missing `.env` file raises `FileNotFoundError` with a clear message.
```python
def test_load_environment_variables_success(monkeypatch, tmp_path):
# Prepare .env file
env_file = tmp_path / ".env"
env_file.write_text("TEST_VAR=test_value\nANOTHER_VAR=123")
# Isolate environment
monkeypatch.delenv("TEST_VAR", raising=False)
monkeypatch.delenv("ANOTHER_VAR", raising=False)
monkeypatch.setenv("PROJECT_BASE", str(tmp_path))
load_environment_variables()
assert os.getenv("TEST_VAR") == "test_value"
assert os.getenv("ANOTHER_VAR") == "123"
```
---
## 2. Version Format Test (Dynamic File Discovery)
> *Note: If present in the test suite, the following update was applied to locate the `pyproject.toml` file relative to the test file instead of assuming the current working directory.*
```diff
-def test_version_format():
- with open("pyproject.toml") as f:
- content = f.read()
+def test_version_format():
+ from pathlib import Path
+ project_root = Path(__file__).parents[1]
+ pyproject_path = project_root / "pyproject.toml"
+ content = pyproject_path.read_text()
```
**Benefit:** Tests are robust to working-directory changes during CI or local runs.
---
## 3. CLI Output Assertions
> *Applicable to `tests/test_cli.py`.*
**Original Approach:**
- Checked the number of `print` calls, which could pass even if text was wrong.
**Improved Approach:**
- Capture `stdout` with `capsys` and assert that expected provider names are present.
```python
def test_main_list_providers(capsys):
run_cli(["list-providers"])
out = capsys.readouterr().out
assert "Available providers:" in out
for provider in ("metoffice", "gfs", "dwd"):
assert provider in out
```
---
## 4. General Best Practices
- **Use of Fixtures:** Code-level fixtures (`temp_dirs`, `sample_nc_file`, etc.) ensure consistent test data creation and cleanup.
- **Mocking External Dependencies:** All network calls (HTTP, S3, Hugging Face, Zarr stores) are stubbed with `unittest.mock.patch` to keep tests fast and offline.
- **Parametrization & Organization:** Future splits of large test files and use of `pytest.mark.parametrize` are recommended to reduce duplication.
- **Coverage Tracking:** A `pytest-cov` integration is encouraged to identify untested branches or error-handling code paths.
---
By adopting these changes, the test suite achieves:
- **Determinism:** No reliance on external state or CWD
- **Clarity:** Tests verify both side effects and content
- **Maintainability:** Easier to extend and refactor with confidence
*Feel free to contribute further improvements, such as property-based tests with Hypothesis or CI-enforced coverage thresholds.*
Contributor
|
Hi @Joy-chakraborty-23 This PR makes a lot of changes , some of it might be unnecessary, I would recommend you to close this issue and make smaller easier to review changes perhaps. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements a new
save-gfs-samplesCLI command to convert GFS Zarr archives into PyTorch.ptsamples for PVNet training. It:src/open_data_pvnet/commands/gfs_samples.py(L1–75), which:--config) describing input Zarr paths and variablesOCFDataSamplerto read GFS Zarr and emit.pttensorssrc/open_data_pvnet/cli.py(L1–40) to register the new commandpyproject.tomlto point theopen-data-pvnetentrypoint atopen_data_pvnet.cli:mainnotebooks/01_save_gfs_samples.ipynbshowing end‑to‑end usageFixes #96
How Has This Been Tested?
pip install -e .