Skip to content

downloading dataset took longer than specified #10

@andycjw

Description

@andycjw

running 'python 0_weatherbench2zarr.py' for almost 12 hours, and 'du -h' on the zarr directory, showed it downloaded only 2.2GB of data.

I can see the internet is running full at around 150mb/s, it could have downloaded like 500GB for this time taken.

is there something wrong with the chunking that it's taking so much bandwidth and taking so long?

I'm not familiar with xarray, is this expected to take this long?

from the python code comment it said
# Save to Zarr with chunks of size 1 along time dimension
# Can take about 1 hour to save 10.7GB of data at 40MB/s

but it's taken more time than this and more bandwidth

edit: thought this might help, these warning below were shown when running

/envs/foss4g2023oceania/lib/python3.10/site-packages/xarray/core/dataset.py:270: UserWarning: The specified chunks separate the stored chunks along dimension "level" starting at index 34. This could degrade performance. Instead, consider rechunking after loading.
warnings.warn(
/envs/foss4g2023oceania/lib/python3.10/site-packages/xarray/core/dataset.py:270: UserWarning: The specified chunks separate the stored chunks along dimension "latitude" starting at index 701. This could degrade performance. Instead, consider rechunking after loading.
warnings.warn(
/envs/foss4g2023oceania/lib/python3.10/site-packages/xarray/core/dataset.py:270: UserWarning: The specified chunks separate the stored chunks along dimension "longitude" starting at index 1404. This could degrade performance. Instead, consider rechunking after loading.
warnings.warn(

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions