Metadata-Version: 2.4
Name: xarray-prism
Version: 2605.0.0
Summary: A multi-format and multi-storage xarray engine with automatic engine detection, and ability to register new data format and uri type for climate data.
Author-email: "DKRZ, Clint" <freva@dkrz.de>
License: BSD 3-Clause License
        
        Copyright (c) 2023, Climate Informatics and Technologies (CLINT)
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Project-URL: Issues, https://github.com/freva-org/xarray-prism/issues
Project-URL: Source, https://github.com/freva-org/xarray-prism/
Keywords: xarray,climate,netcdf,zarr,grib,geotiff
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: xarray
Requires-Dist: fsspec
Requires-Dist: h5py
Requires-Dist: h5netcdf
Requires-Dist: scipy
Requires-Dist: zarr
Requires-Dist: cfgrib
Requires-Dist: eccodes
Requires-Dist: rioxarray
Requires-Dist: rasterio
Requires-Dist: netCDF4
Requires-Dist: s3fs
Requires-Dist: gcsfs
Requires-Dist: adlfs
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: codespell; extra == "dev"
Requires-Dist: tox; extra == "dev"
Requires-Dist: ipython; extra == "dev"
Dynamic: license-file

# Xarray Prism Engine

A multi-format and multi-storage xarray engine with automatic engine detection,
and ability to register new data format and uri type for climate data.

> [!Important]
> If you encounter with a data formats that `prism` engine is not able to open, please
> files an issue report [here](https://github.com/freva-org/xarray-prism/issues/new).
> This helps us to improve the engine enabling users work with different kinds of climate data.


## Installation

### Install via PyPI

```bash
pip install xarray-prism
```

### Install via Conda

```bash
conda install xarray-prism
```

## Quick Start

### Using with xarray

```python
import xarray as xr

# Auto-detect format
ds = xr.open_dataset("my_data.unknown_fmt", engine="prism")

# Remote Zarr on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.zarr",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote NetCDF3 on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.nc",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote NetCDF4 on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.nc4",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote Zarr on S3 - non-anon
ds = xr.open_dataset(
    "s3://bucket/data.zarr",
    engine="prism",
    storage_options={
        "key": "YOUR_KEY",
        "secret": "YOUR_SECRET",
        "client_kwargs": {
            "endpoint_url": "S3_ENDPOINT"
        }
    }
)

# OPeNDAP from THREDDS
ds = xr.open_dataset(
    "https://icdc.cen.uni-hamburg.de/thredds/dodsC/ftpthredds/ar5_sea_level_rise/gia_mean.nc",
    engine="prism"
)

# Local GRIB file
ds = xr.open_dataset("forecast.grib2", engine="prism")

# GeoTIFF
ds = xr.open_dataset("satellite.tif", engine="prism")

# tip: Handle the cache manually by yourself
xr.open_dataset(
    "simplecache::s3://bucket/file.nc3",
    engine="prism",
    storage_options={
        "s3": {"anon": True, "client_kwargs": {"endpoint_url": "..."}},
        "simplecache": {"cache_storage": "/path/to/cache"}
    }
)

# Even for the tif format on the S3 you can pass the credential through
# storage_options which is not supported by rasterio:
xr.open_dataset(
    "s3://bucket/file.tif",
    engine="prism",
    storage_options={
        "key": "YOUR_KEY",
        "secret": "YOUR_SECRET",
        "client_kwargs": {
            "endpoint_url": "S3_ENDPOINT"
        }
    }
)
```

## Supported Formats


|Data format   | Remote backend         | Local FS  | Cache|
|--------------|------------------------|-----------|-----------|
|GRIB          | cfgrib + fsspec        | cfgrib    | fsspec simplecache (full-file)|
|Zarr          | zarr + fsspec          | zarr      | chunked key/value store|
|NetCDF3       | scipy + fsspec         | scipy     | fsspec byte cache (5 MB blocks but full dowload)|
|NetCDF4/HDF5  | h5netcdf + fsspec      | h5netcdf  | fsspec byte cache (5 MB block)|
|GeoTIFF       | rasterio + fsspec      | rasterio  | GDAL/rasterio block cache (5 MB block)|
|OPeNDAP/DODS  | netCDF4                | n/a       | n/a|


> [!WARNING]
> **Remote GRIB & NetCDF3 require full file download**
> 
> Unlike Zarr or HDF5, these formats don't support partial/chunk reads over the network.
> 
> By default, xarray-prism caches files in the system temp directory.
> This works well for most cases.
> If temp storage is a concern (e.g., limited space or cleared on reboot),
> you can specify a persistent cache:
> 
> | Option | How |
> |--------|-----|
> | Environment variable | `export XARRAY_PRISM_CACHE=/path/to/cache` |
> | Per-call | `storage_options={"simplecache": {"cache_storage": "/path"}}` |
> | Default | System temp directory |

### Cache management

You can inspect or evict the cache manually:

```python
import xarray_prism as xp

xp.cache_info()
# {'files': 12, 'size_bytes': 2400000000, 'path': '/tmp/xarray-prism-cache'}

# Preview what would be removed
xp.clear_cache(dry_run=True)

# Evict with custom thresholds
xp.clear_cache(max_age_days=3, max_size_gb=2)

# Remove everything
xp.clear_cache(max_age_days=0, max_size_gb=0)
```

> [!NOTE]
> `max_age_days` and `max_size_gb` can also be set via
> the following environment variables:
> | Policy | Default | Override |
> |--------|---------|----------|
> | TTL (last-access) | 7 days | `XARRAY_PRISM_MAX_AGE_DAYS=N` |
> | Size cap (LRU) | 10 GB | `XARRAY_PRISM_MAX_SIZE_GB=N` |

### Logging

By default xarray-prism is silent (`WARNING` level). Set `XARRAY_PRISM_LOG_LEVEL` to change verbosity:

```bash
# Show detection and open steps
XARRAY_PRISM_LOG_LEVEL=DEBUG python my_script.py

# Suppress everything except errors
XARRAY_PRISM_LOG_LEVEL=ERROR python my_script.py
```

Accepted values: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`.


## Customization

### Custom Format Detectors and URI Types

You can extend **xarray-prism** with custom *format detectors*, *URI types*, and *open handlers* by providing a small plugin package.
Registration happens **at import time**, so importing the plugin activates it.

### Plugin structure

```text
xarray_prism_myplugin/
  __init__.py   # imports the plugin module (triggers registration)
  plugin.py     # detectors, URI types, and open handlers
pyproject.toml
```

### Plugin implementation

`xarray_prism_myplugin/__init__.py`

```python
from .plugin import *  # noqa: F401,F403
```

`xarray_prism_myplugin/plugin.py`

```python
import xarray as xr
from xarray_prism import register_detector, register_uri_type, registry


@register_uri_type(priority=100)
def detect_myfs_uri(uri: str):
    """Detect a custom filesystem URI."""
    if uri.lower().startswith("myfs://"):
        return "myfs"
    return None


@register_detector(priority=100)
def detect_foo_format(uri: str):
    """Detect a custom file format."""
    if uri.lower().endswith(".foo"):
        return "foo"
    return None


@registry.register("foo", uri_type="myfs")
def open_foo_from_myfs(uri: str, **kwargs):
    """Open .foo files from myfs:// URIs."""
    translated = uri.replace("myfs://", "https://my-gateway.example/")
    return xr.open_dataset(translated, engine="h5netcdf", **kwargs)
```

### Plugin installation

`pyproject.toml`

```toml
[project]
name = "xarray-prism-myplugin"
version = "0.1.0"
dependencies = ["xarray-prism"]

[project.entry-points."xarray_prism.plugins"]
myplugin = "xarray_prism_myplugin"
```

### Using the plugin

After installing the plugin package, **import it once** to activate the registrations:

```python
import xarray_prism_myplugin  # activates detectors and handlers

import xarray as xr
ds = xr.open_dataset("myfs://bucket/path/data.foo", engine="prism")
```


## Development

### Setup Development Environment

```bash
# Start test services (MinIO, THREDDS)
docker-compose -f dev-env/docker-compose.yaml up -d --remove-orphans

# Create conda environment
conda create -n xarray-prism python=3.12 -y
conda activate xarray-prism

# Install package in editable mode with dev dependencies
pip install -e ".[dev]"
```

### Running Tests

```bash
# Run tests
tox -e test

# Run with coverage
tox -e test-cov

# Lint
tox -e lint

# Type checking
tox -e types

# Auto-format code
tox -e format
```

### Creating a Release

Releases are managed via GitHub Actions and tox:

```bash
# Tag a new release (creates git tag)
tox -e release
```

The release workflow is triggered automatically when:
- A version tag (`v*.*.*`) is pushed -> Full release to PyPI
- Manual workflow dispatch with RC number -> Pre-release to PyPI
