Metadata-Version: 2.4
Name: datalad-cds
Version: 0.0.14
Summary: DataLad extension for downloading from the Copernicus Climate Data Store
Author: Matthias Riße
Author-email: The DataLad Team and Contributors <team@datalad.org>, Benedikt Bulich <b.bulich@fz-juelich.de>, Daniel Klauß <daniel.klauss@alumni.fh-aachen.de>, Laurens Jan van Haaren <l.van.haaren@fz-juelich.de>, Matthias Riße <m.risse@fz-juelich.de>
Maintainer-email: Matthias Riße <m.risse@fz-juelich.de>
License: # Main Copyright/License
        
        DataLad, including all examples, code snippets and attached
        documentation is covered by the MIT license.
        
          The MIT License
        
          Copyright (c) 2018-     DataLad Team
        
          Permission is hereby granted, free of charge, to any person obtaining a copy
          of this software and associated documentation files (the "Software"), to deal
          in the Software without restriction, including without limitation the rights
          to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
          copies of the Software, and to permit persons to whom the Software is
          furnished to do so, subject to the following conditions:
        
          The above copyright notice and this permission notice shall be included in
          all copies or substantial portions of the Software.
        
          THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
          IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
          FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
          AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
          LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
          OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
          THE SOFTWARE.
        
        See CONTRIBUTORS file for a full list of contributors.
        
Project-URL: repository, https://github.com/matrss/datalad-cds
Classifier: Programming Language :: Python
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: datalad>=0.17.0
Requires-Dist: annexremote>=1.6.0
Requires-Dist: cdsapi>=0.5.1
Provides-Extra: devel
Requires-Dist: coverage; extra == "devel"
Requires-Dist: hypothesis; extra == "devel"
Requires-Dist: mypy; extra == "devel"
Requires-Dist: pytest; extra == "devel"
Requires-Dist: pytest-cov; extra == "devel"
Requires-Dist: ruff; extra == "devel"
Requires-Dist: sphinx; extra == "devel"
Requires-Dist: sphinx_rtd_theme; extra == "devel"
Dynamic: license-file

# DataLad extension for the Copernicus Climate Data Store


## What?

A DataLad extension to integrate with the Copernicus Climate Data Store (CDS).
So far this just implements a `datalad download-cds` command that can be used to fetch data from the CDS
and record this action in a way so that `datalad get` (or just `git annex get`) can redo the download in the future.


## Why?

This extension enables automated provenance tracking for fetching data from the CDS.
In a dataset that retrieves data from the CDS using this extension it will become visible how this data was initially fetched
and how it can be retrieved again in the future.


## How?

You will first have to create an account with the CDS,
if you don't have one already.
You can do so here: <https://cds.climate.copernicus.eu/user/register?destination=%2F%23!%2Fhome>

Next,
you will need to create the "~/.cdsapirc" file as described here: <https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key>.
This file is required since the datalad-cds extension internally uses the cdsapi package
and therefore uses its authentication mechanism.

Also,
you need to install datalad and the datalad-cds extension.
Both can be had through pip.

Now you are ready to use the extension.
When you look through the CDS you will notice that for any given dataset you can select a subset of the data using the "Download data" tab.
After you do that you can use the "Show API request" button at the bottom to get a short python script that would fetch the chosen subset using the cdsapi.
The following is an example of that:
```python
#!/usr/bin/env python
import cdsapi
c = cdsapi.Client()
c.retrieve(
    "reanalysis-era5-pressure-levels",
    {
        "variable": "temperature",
        "pressure_level": "1000",
        "product_type": "reanalysis",
        "year": "2008",
        "month": "01",
        "day": "01",
        "time": "12:00",
        "format": "grib"
    },
    "download.grib",
)
```

To fetch the same data to the same local file using datalad-cds we just need to adapt this a little:
```bash
$ datalad download-cds --path download.grib '
    {
        "dataset": "reanalysis-era5-pressure-levels",
        "sub-selection": {
            "variable": "temperature",
            "pressure_level": "1000",
            "product_type": "reanalysis",
            "year": "2008",
            "month": "01",
            "day": "01",
            "time": "12:00",
            "format": "grib"
        }
    }
'
```

The local path to save to ("download.grib") becomes the `--path` argument.
The dataset name ("reanalysis-era5-pressure-levels" in this case) becomes the value of the `dataset` key in a json object that describes the data to be downloaded.
The sub-selection of the dataset becomes the value of the `sub-selection` key.

After executing the above `datalad download-cds` command in a DataLad dataset a file called "download.grib" should be newly created.
This file will have its origin tracked in git-annex (you can see that by running `git annex whereis download.grib`).
If you now `datalad drop` the file
and then `datalad get` it you'll see that git-annex will automatically re-retrieve the file from the CDS
as if it was just another location to get data from.

To see more possible usage options take a look at the help page of the command (`datalad download-cds --help`)
or the documentation at <https://matrss.github.io/datalad-cds/>.
