Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
5f87745
atlite-mrel first commit
Oct 16, 2025
20045e1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2025
ffb156d
code resolutions for merge
Nov 7, 2025
2b0380e
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 7, 2025
dcffaef
commit to fork
Nov 7, 2025
3383ebc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2025
7b6872f
fix count message error
Nov 10, 2025
335a67c
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 10, 2025
1b7efd7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 10, 2025
7067490
fix docstring
Nov 10, 2025
3f0c0df
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 10, 2025
7aaae50
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 10, 2025
a1a2aaa
fix syntax
Nov 10, 2025
963896d
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 10, 2025
76f94fd
copyrights cerra
Nov 10, 2025
5f6cfe2
fix syntax
Nov 11, 2025
39eefe6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 11, 2025
ba748f4
convert func
Nov 11, 2025
f2acfc4
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 11, 2025
1de851d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 11, 2025
83ae74a
wec renaming
Nov 14, 2025
669ad8e
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 14, 2025
400b591
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 14, 2025
4a3e2a2
cutout example MREL
Nov 20, 2025
c76e86c
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 20, 2025
124a9d2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2025
66a2d65
update convert function
Nov 20, 2025
9cb7ff7
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 20, 2025
58d0a2c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2025
a63d82c
fix output
Nov 20, 2025
0c50fa9
update commit
Nov 20, 2025
d53fdd6
correct syntax
Nov 20, 2025
0c0b726
Merge branch 'wecmatrices-datamodules' of https://github.com/lmezilis…
Nov 20, 2025
5cc55a8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2025
8b4ae18
syntax corrections
Nov 20, 2025
9b7cae6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2025
b06dfd3
delete picture
Nov 20, 2025
0363202
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2025
270951b
Merge branch 'master' into wecmatrices-datamodules
lmezilis Nov 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@ paper
# Ignore IDE project files
.idea/
.vscode
.vs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally best to add these pointers to your own "global" gitignore, rather than to every project you work on. That way, it never accidentally slips in without you realising it!

62 changes: 62 additions & 0 deletions atlite/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
from atlite.resource import (
get_cspinstallationconfig,
get_solarpanelconfig,
get_wecgeneratorconfig,
get_windturbineconfig,
windturbine_smooth,
)
Expand Down Expand Up @@ -653,6 +654,67 @@ def wind(
)


# #wave
def convert_wave(ds, wec_type):
power_matrix = pd.DataFrame.from_dict(wec_type["Power_Matrix"])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should have a docstring here


max_pow = power_matrix.to_numpy().max()

Hs = np.ceil(ds["wave_height"] * 2) / 2
Tp = np.ceil(ds["wave_period"] * 2) / 2

Hs_list = Hs.to_numpy().flatten().tolist()
Tp_list = Tp.to_numpy().flatten().tolist()

# empty list for result
power_list = []
cases = len(Hs_list)
count = 0

# for loop to loop through Hs and Tp pairs and get the power output and capacity factor
for Hs_ind, Tp_ind in zip(Hs_list, Tp_list):
if count % 1000000 == 0:
print(f"Case {count} of {cases}: {count / cases * 100} %")

if np.isnan(Hs_ind) or np.isnan(Tp_ind):
power_list.append(0)
elif Hs_ind > 10 or Tp_ind > 18:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are cut-outs, I assume. Should they not be specified in the power matrices? Or, rather, they look to be outside the matrices, so we'd get cf NaN values, which we can fill with zeros later. That way, it doesn't cause a silent error if in future a power matrix is added that has non-zero values above these thresholds.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct they indicate the older limits of power conversion. It is true that they are already specified in the matrices so this power filter can be erased from this section.

power_list.append(0)
else:
generated_power = power_matrix.loc[Hs_ind, Tp_ind]
power_list.append(generated_power / max_pow)
count += 1

# results list to numpy array
power_list_np = np.array(power_list)

power_list_np = power_list_np.reshape(Hs.shape)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is quite inefficient for combining two matrices of data. We want to vectorise operations where possible. I have an idea how to achieve this which I'll test locally and then come back to you.


da = xr.DataArray(
power_list_np, coords=Hs.coords, dims=Hs.dims, name="Power generated"
)
da.attrs["units"] = "kWh/kWp"
da = da.rename("specific generation")
da = da.fillna(0)

return da


def wave(cutout, wec_type, **params):
Comment thread
lmezilis marked this conversation as resolved.
Outdated
"""
Generate wave generation time series

evaluates the significant wave height (Hs) and wave peak period (Tp)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Hs and Tp are MREL-specific and wouldn't make sense if using ERA5 data, it might make more sense to not reference them by these acronyms in this file.

and assesses the power output with the chosen power matrix for each time step and grid cell
"""
if isinstance(wec_type, (str, Path)):
wec_type = get_wecgeneratorconfig(wec_type)

return cutout.convert_and_aggregate(
convert_func=convert_wave, wec_type=wec_type, **params
)


# irradiation
def convert_irradiation(
ds,
Expand Down
3 changes: 3 additions & 0 deletions atlite/cutout.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
soil_temperature,
solar_thermal,
temperature,
wave,
wind,
)
from atlite.data import available_features, cutout_prepare
Expand Down Expand Up @@ -661,6 +662,8 @@ def layout_from_capacity_list(self, data, col="Capacity"):

wind = wind

wave = wave

irradiation = irradiation

pv = pv
Expand Down
10 changes: 8 additions & 2 deletions atlite/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@
atlite datasets.
"""

from atlite.datasets import era5, gebco, sarah
from atlite.datasets import cerra, era5, gebco, mrel_wave, sarah

modules = {"era5": era5, "sarah": sarah, "gebco": gebco}
modules = {
"era5": era5,
"sarah": sarah,
"mrel_wave": mrel_wave,
"cerra": cerra,
"gebco": gebco,
}
68 changes: 68 additions & 0 deletions atlite/datasets/cerra.py

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CERRA is available via the Climate Data Store, we should follow the same approach to data retrieval as with ERA5. You could probably just copy era5.py directly and delete all but the wind getter method, then adapt the wind getter method to match the data available from CERRA.

It might be best to drop this from this PR, though, and bring it in separately later. I can see a benefit to changing the features we bring in for wind since we can retrieve wind speed at various height/pressure levels from CDS, which would allow us to create a wind vertical profile (as we do with ERA5 data).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with CERRA here is that there is a lot of preprocessing of data in order for atlite to be able to read it. I agree that it is best to review this later. Should I take an action on this or can you simply reject this file?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest is for you to simply delete this file (and reference to cerra elsewhere). If we want to revisit it, this file will still be in the commit history so we can bring it back if we want anything from it!

Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""
In order to create a CERRA cutout, the data must be manually downloaded from the Climate Data Store.
The variable used is "10m wind speed" and there is not a direction component in it.
This 10m wind speed was transformed into a 100m wind speed in order to follow the rest of atlite's processes.
"""

import logging

import numpy as np
import xarray as xr
from rasterio.warp import Resampling

from atlite.gis import regrid

logger = logging.getLogger(__name__)

crs = 4326
dx = 0.05
dy = 0.05

features = {"wind": ["wnd100m", "roughness"]}


def as_slice(bounds, pad=True):
"""
Convert coordinate bounds to slice and pad by 0.01.
"""
if not isinstance(bounds, slice):
bounds = bounds + (-0.01, 0.01)
bounds = slice(*bounds)
return bounds


def get_data(cutout, feature, tmpdir, **creation_parameters):
"""
Retrieve data from a local CERRA dataset and process it.
"""
coords = cutout.coords

if "data_path" not in creation_parameters:
logger.error('Argument "data_path" not defined')
raise ValueError('Argument "data_path" not defined')
path = creation_parameters["data_path"]

ds = xr.open_dataset(path)

ds = ds.sel(x=as_slice(cutout.extent[:2]), y=as_slice(cutout.extent[2:]))
ds = ds.assign_coords(x=ds.x.astype(float).round(4), y=ds.y.astype(float).round(4))

if (cutout.dx != dx) or (cutout.dy != dy):
ds = regrid(ds, coords["x"], coords["y"], resampling=Resampling.average)

if "sr" in ds:
ds = ds.rename({"sr": "roughness"})

logger.info("Calculating 100 metre wind speed")
if "si10" in ds and "roughness" in ds:
ds["wnd100m"] = (
ds["si10"] * (np.log(100 / ds["roughness"]) / np.log(10 / ds["roughness"]))
).assign_attrs(units="m s**-1", long_name="100 metre wind speed")
ds = ds.drop_vars("si10")

ds = ds.assign_coords(x=ds.coords["x"], y=ds.coords["y"])

logger.info("Resampling to 1H.")
ds = ds.resample(time="1h").interpolate("linear")

return ds
51 changes: 50 additions & 1 deletion atlite/datasets/era5.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def nullcontext():

features = {
"height": ["height"],
"wind": ["wnd100m", "wnd_shear_exp", "wnd_azimuth", "roughness"],
"wind": ["wnd100m", "wnd_azimuth", "roughness"],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably didn't mean to delete "wnd_shear_exp". I assume this is needed by other atlite methods.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was working with an older version of atlite, this was accidental.

"influx": [
"influx_toa",
"influx_direct",
Expand All @@ -55,6 +55,8 @@ def nullcontext():
],
"temperature": ["temperature", "soil temperature", "dewpoint temperature"],
"runoff": ["runoff"],
"wave_height": ["wave_height"],
"wave_period": ["wave_period"],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collapse this into a wave: ["wave_height", "wave_period"] option. Then users can request the wave feature and get both of these variables. It's unlikely they'd ever want one but not the other.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, thank you.

}

static_features = {"height"}
Expand Down Expand Up @@ -244,6 +246,53 @@ def sanitize_runoff(ds):
return ds


def get_data_wave_height(retrieval_params):
"""
Get wave height data for given retrieval parameters.
"""
ds = retrieve_data(
variable=[
"significant_height_of_combined_wind_waves_and_swell",
],
**retrieval_params,
)
ds = _rename_and_clean_coords(ds)
ds = ds.rename({"swh": "wave_height"})

return ds


def sanitize_wave_height(ds):
"""
Sanitize retrieved wave height data.
"""
ds["wave_height"] = ds["wave_height"].clip(min=0.0)
return ds


def get_data_wave_period(retrieval_params):
"""
Get wave period data for given retrieval parameters.
"""
ds = retrieve_data(
variable=["peak_wave_period"],
**retrieval_params,
)

ds = _rename_and_clean_coords(ds)
ds = ds.rename({"pp1d": "wave_period"})

return ds


def sanitize_wave_period(ds):
"""
Sanitize retrieved wave period data.
"""
ds["wave_period"] = ds["wave_period"].clip(min=0.0)
return ds


def get_data_height(retrieval_params):
"""
Get height data for given retrieval parameters.
Expand Down
108 changes: 108 additions & 0 deletions atlite/datasets/mrel_wave.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
import logging

import numpy as np
import xarray as xr
from rasterio.warp import Resampling

from atlite.gis import regrid

logger = logging.getLogger(__name__)

crs = 4326
dx = 0.0625
dy = 0.04

features = {"wave_height": ["wave_height"], "wave_period": ["wave_period"]}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused, I suggest you change it to allow you to use it later on

Suggested change
features = {"wave_height": ["wave_height"], "wave_period": ["wave_period"]}
features = {"hs": "wave_height", "tp": "wave_period"}

Then, you can get rid of the very simple get_data_... methods in favour of a simple loop in get_data



def _rename_and_clean_coords(ds):
"""
Rename 'longitude' and 'latitude' columns to 'x' and 'y' and fix roundings.

Optionally (add_lon_lat, default:True) preserves latitude and
longitude columns as 'lat' and 'lon'.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mention this option in the docstring but it isn't in the method signature. Do you want to include this option or not?

@lmezilis lmezilis Nov 6, 2025

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this option was not necessary for the mrel wave files. I eventually renamed the dimentions in the last part of the script. I would say that the function "_rename_and_clean_coords" can be deleted here. do you think we should include this option?

EDIT: I saw the rest of the comments now, I will include the function and try to make it as similar to era5 as possible.

"""
ds = ds.rename({"longitude": "x", "latitude": "y"})

ds = ds.assign_coords(
x=np.round(ds.x.astype(float), 5), y=np.round(ds.y.astype(float), 5)
)

return ds


def get_data_wave_height(ds):
ds = ds.rename({"hs": "wave_height"})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, this renaming is happening in the main function now (rename(features)).

ds["wave_height"] = ds["wave_height"].clip(min=0.0)

return ds


def get_data_wave_period(ds):
ds = ds.rename({"tp": "wave_period"})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, this renaming is happening in the main function now (rename(features)).

# ds["wave_period"] = (1 / ds["wave_period"])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always better to rely on version history to recover lines of code you no longer need, rather than commenting them out. So, feel free to delete all your commented out lines!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry for this, I thought I did that for every file but forgot this one.

ds["wave_period"] = ds["wave_period"].clip(min=0.0)

return ds

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the process the same as in era5.py - split this into data retrieval and then sanitisation and call the sanitisation function only if requested in get_data (you can copy most of the functionality directly over from era5.py)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: you don't actually need these get_data_... methods in this module, but having sanitize_... methods would be good. They can then be called iteratively in get_data as is done in era5.py

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct, I was working with these files a long time ago, figuring out how which functions I need, will correct this!



def as_slice(bounds, pad=True):
"""
Convert coordinate bounds to slice and pad by 0.01.
"""
if not isinstance(bounds, slice):
bounds = bounds + (-0.01, 0.01)
bounds = slice(*bounds)
return bounds

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def as_slice(bounds, pad=True):
"""
Convert coordinate bounds to slice and pad by 0.01.
"""
if not isinstance(bounds, slice):
bounds = bounds + (-0.01, 0.01)
bounds = slice(*bounds)
return bounds
def _bounds(coords, pad: float=0) -> dict[str, slice]:
"""
Convert coordinate bounds to slice and pad if requested.
"""
x0, x1 = coords["x"].min().item() - pad, coords["x"].max().item() + pad
y0, y1 = coords["y"].min().item() - pad, coords["y"].max().item() + pad
return {"x": slice(x0, x1), "y": slice(y0, y1)}

Then, below, use _bounds when making your selection.

This (a) explicitly uses your pad argument but allows the user to define the padding (note that the pad argument will need to grabbed from creation_parameters for it to actually do anything) and (b) is more explicit about how the x and y values are accessed (cutout.extent[:2] is less readable)



def get_data(cutout, feature, tmpdir, **creation_parameters):
coords = cutout.coords

if "data_path" not in creation_parameters:
logger.error('Argument "data_path" not defined')
raise ValueError('Argument "data_path" not defined')
path = creation_parameters["data_path"]

ds = xr.open_dataset(path)

if "longitude" in ds and "latitude" in ds:
ds = ds.rename({"longitude": "x", "latitude": "y"})

ds = ds.sel(x=as_slice(cutout.extent[:2]), y=as_slice(cutout.extent[2:]))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ds = ds.sel(x=as_slice(cutout.extent[:2]), y=as_slice(cutout.extent[2:]))
bounds = _bounds(cutout.coords, pad=creation_parameters.get("pad", 0))
ds = ds.sel(**bounds)

ds = ds.assign_coords(x=ds.x.astype(float).round(4), y=ds.y.astype(float).round(4))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have some repetition here w.r.t. _rename_and_clean_coords, which also renames coords and rounds values. For maintainability, it's probably best to use the exact same approach as in era5.py (i.e., do not update coords here, use _rename_and_clean_coords instead).

However, since you only have one dataset from the start, you could call _rename_and_clean_coords directly here and not in each of the get_data_wave... methods.


if (cutout.dx != dx) or (cutout.dy != dy):
ds = regrid(ds, coords["x"], coords["y"], resampling=Resampling.average)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to _rename_and_clean_coords


# coords = cutout.coords

# if "data_path" not in creation_parameters:
# logger.error('Argument "data_path" not defined')
# return None

# path = creation_parameters["data_path"]

# logger.info(f"Opening dataset from {path}")
# ds = xr.open_dataset(path, chunks=cutout.chunks)
# ds = _rename_and_clean_coords(ds)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

variables = ds.data_vars

for var in variables:
if var not in ["hs", "tp"]:
ds = ds.drop_vars(var)

# ds = ds.sel(x=as_slice(cutout.extent[:2]), y=as_slice(cutout.extent[2:]))

# if (cutout.dx != dx) or (cutout.dy != dy):
# ds = regrid(ds, coords["x"], coords["y"], resampling=Resampling.average)

logger.info("Obtaining wave data.")

ds = get_data_wave_height(ds)
ds = get_data_wave_period(ds)

# ds = ds.assign_coords(x=ds.coords["x"], y=ds.coords["y"])

return ds

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
variables = ds.data_vars
for var in variables:
if var not in ["hs", "tp"]:
ds = ds.drop_vars(var)
# ds = ds.sel(x=as_slice(cutout.extent[:2]), y=as_slice(cutout.extent[2:]))
# if (cutout.dx != dx) or (cutout.dy != dy):
# ds = regrid(ds, coords["x"], coords["y"], resampling=Resampling.average)
logger.info("Obtaining wave data.")
ds = get_data_wave_height(ds)
ds = get_data_wave_period(ds)
# ds = ds.assign_coords(x=ds.coords["x"], y=ds.coords["y"])
return ds
ds = ds[list(features.values())].rename(features)
for feature in features.values():
sanitize_func = globals().get(f"sanitize_{feature}")
if sanitize and sanitize_func is not None:
ds = sanitize_func(ds)
return ds

See my suggestion above on how to make this work by updating the features global variable. The reason I suggest this approach is that you don't need to explicitly delete all other variables (as you are doing with drop_vars), instead you just actively keep the variables you want.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this suggestion.

Loading