Metadata-Version: 2.4
Name: srsly
Version: 3.0.0
Summary: Modern high-performance serialization utilities for Python
Home-page: https://github.com/explosion/srsly
Author: Explosion
Author-email: contact@explosion.ai
License: MIT
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cloudpickle<4,>=3.1.2
Requires-Dist: msgpack<2,>=1.1
Requires-Dist: ruamel.yaml<1,>=0.18.16
Requires-Dist: ujson<6,>=5.11.0
Dynamic: license-file

<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>

# srsly: Modern high-performance serialization utilities for Python

This package bundles some of the best Python serialization libraries into one
convenience package, with a high-level API that makes it easy to write code
that's correct across platforms and Pythons. This allows us to provide all the
serialization utilities we need in a single binary wheel. Currently supports
**JSON**, **JSONL**, **MessagePack**, **Pickle** and **YAML**.

[![tests](https://github.com/explosion/srsly/actions/workflows/tests.yml/badge.svg)](https://github.com/explosion/srsly/actions/workflows/tests.yml)
[![PyPi](https://img.shields.io/pypi/v/srsly.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.python.org/pypi/srsly)
[![conda](https://img.shields.io/conda/vn/conda-forge/srsly.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/srsly)
[![GitHub](https://img.shields.io/github/release/explosion/srsly/all.svg?style=flat-square&logo=github)](https://github.com/explosion/srsly)
[![Python wheels](https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white)](https://github.com/explosion/wheelwright/releases)

## Motivation

Serialization is hard, especially across Python versions and multiple platforms.
After dealing with many subtle bugs over the years (encodings, locales, large
files) our libraries like [spaCy](https://github.com/explosion/spaCy) and
[Prodigy](https://prodi.gy) had steadily grown a number of utility functions to
wrap the multiple serialization formats we need to support (especially `json`,
`msgpack` and `pickle`). These wrapping functions ended up duplicated across our
codebases, so we wanted to put them in one place.

`srsly` currently includes wrappers around the following packages:

- [`ujson`](https://github.com/esnme/ultrajson)
- [`msgpack`](https://github.com/msgpack/msgpack-python)
- [`cloudpickle`](https://github.com/cloudpipe/cloudpickle)
- [`ruamel.yaml`](https://github.com/pycontribs/ruamel-yaml) (without unsafe
  implementations!)

Additionally, it includes a heavily customized fork of 
[`msgpack-numpy`](https://github.com/lebedov/msgpack-numpy), with corrected
round-trip behaviour for np.float64 objects.


## Installation

`srsly` can be installed from pip.

```bash
python -m pip install srsly
```

Or from conda via conda-forge:

```bash
conda install -c conda-forge srsly
```

This will automatically install/upgrade all dependencies.

numpy and cupy are optional dependencies for msgpack.
If numpy is installed, numpy objects can be serialized.
If cupy is installed, cupy objects will be automaticaly converted
to numpy and then serialized.


Alternatively, you can also install the library from the repository:

```bash
# clone the repo
git clone https://github.com/explosion/srsly
cd srsly

# create a virtual environment
python -m venv .env
source .env/bin/activate

# install from source
python -m pip install .
```

For developers, install requirements separately and then install in editable
mode without build isolation:

```bash
# install in editable mode
python -m pip install --no-build-isolation --editable .

# run test suite
python -m pytest --pyargs srsly
```

## API

### JSON

#### <kbd>function</kbd> `srsly.json_dumps`

Serialize an object to a JSON string. Falls back to `json` if `sort_keys=True`
is used (until it's fixed in `ujson`).

```python
data = {"foo": "bar", "baz": 123}
json_string = srsly.json_dumps(data)
```

| Argument    | Type | Description                                            |
| ----------- | ---- | ------------------------------------------------------ |
| `data`      | -    | The JSON-serializable data to output.                  |
| `indent`    | int  | Number of spaces used to indent JSON. Defaults to `0`. |
| `sort_keys` | bool | Sort dictionary keys. Defaults to `False`.             |
| **RETURNS** | str  | The serialized string.                                 |

#### <kbd>function</kbd> `srsly.json_loads`

Deserialize unicode or bytes to a Python object.

```python
data = '{"foo": "bar", "baz": 123}'
obj = srsly.json_loads(data)
```

| Argument    | Type        | Description                     |
| ----------- | ----------- | ------------------------------- |
| `data`      | str / bytes | The data to deserialize.        |
| **RETURNS** | -           | The deserialized Python object. |

#### <kbd>function</kbd> `srsly.write_json`

Create a JSON file and dump contents or write to standard output.

```python
data = {"foo": "bar", "baz": 123}
srsly.write_json("/path/to/file.json", data)
```

| Argument | Type         | Description                                            |
| -------- | ------------ | ------------------------------------------------------ |
| `path`   | str / `Path` | The file path or `"-"` to write to stdout.             |
| `data`   | -            | The JSON-serializable data to output.                  |
| `indent` | int          | Number of spaces used to indent JSON. Defaults to `2`. |

#### <kbd>function</kbd> `srsly.read_json`

Load JSON from a file or standard input.

```python
data = srsly.read_json("/path/to/file.json")
```

| Argument    | Type         | Description                                |
| ----------- | ------------ | ------------------------------------------ |
| `path`      | str / `Path` | The file path or `"-"` to read from stdin. |
| **RETURNS** | dict / list  | The loaded JSON content.                   |

#### <kbd>function</kbd> `srsly.write_gzip_json`

Create a gzipped JSON file and dump contents.

```python
data = {"foo": "bar", "baz": 123}
srsly.write_gzip_json("/path/to/file.json.gz", data)
```

| Argument | Type         | Description                                            |
| -------- | ------------ | ------------------------------------------------------ |
| `path`   | str / `Path` | The file path.                                         |
| `data`   | -            | The JSON-serializable data to output.                  |
| `indent` | int          | Number of spaces used to indent JSON. Defaults to `2`. |

#### <kbd>function</kbd> `srsly.write_gzip_jsonl`

Create a gzipped JSONL file and dump contents.

```python
data = [{"foo": "bar"}, {"baz": 123}]
srsly.write_gzip_json("/path/to/file.jsonl.gz", data)
```

| Argument          | Type         | Description                                                                                                                                                                                                             |
| ----------------- | ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `path`            | str / `Path` | The file path.                                                                                                                                                                                                          |
| `lines`           | -            | The JSON-serializable contents of each line.                                                                                                                                                                            |
| `append`          | bool         | Whether or not to append to the location. Appending to .gz files is generally not recommended, as it doesn't allow the algorithm to take advantage of all data when compressing - files may hence be poorly compressed. |
| `append_new_line` | bool         | Whether or not to write a new line before appending to the file.                                                                                                                                                        |

#### <kbd>function</kbd> `srsly.read_gzip_json`

Load gzipped JSON from a file.

```python
data = srsly.read_gzip_json("/path/to/file.json.gz")
```

| Argument    | Type         | Description              |
| ----------- | ------------ | ------------------------ |
| `path`      | str / `Path` | The file path.           |
| **RETURNS** | dict / list  | The loaded JSON content. |

#### <kbd>function</kbd> `srsly.read_gzip_jsonl`

Load gzipped JSONL from a file.

```python
data = srsly.read_gzip_jsonl("/path/to/file.jsonl.gz")
```

| Argument    | Type         | Description               |
| ----------- | ------------ | ------------------------- |
| `path`      | str / `Path` | The file path.            |
| **RETURNS** | dict / list  | The loaded JSONL content. |

#### <kbd>function</kbd> `srsly.write_jsonl`

Create a JSONL file (newline-delimited JSON) and dump contents line by line, or
write to standard output.

```python
data = [{"foo": "bar"}, {"baz": 123}]
srsly.write_jsonl("/path/to/file.jsonl", data)
```

| Argument          | Type         | Description                                                                                                            |
| ----------------- | ------------ | ---------------------------------------------------------------------------------------------------------------------- |
| `path`            | str / `Path` | The file path or `"-"` to write to stdout.                                                                             |
| `lines`           | iterable     | The JSON-serializable lines.                                                                                           |
| `append`          | bool         | Append to an existing file. Will open it in `"a"` mode and insert a newline before writing lines. Defaults to `False`. |
| `append_new_line` | bool         | Defines whether a new line should first be written when appending to an existing file. Defaults to `True`.             |

#### <kbd>function</kbd> `srsly.read_jsonl`

Read a JSONL file (newline-delimited JSON) or from JSONL data from standard
input and yield contents line by line. Blank lines will always be skipped.

```python
data = srsly.read_jsonl("/path/to/file.jsonl")
```

| Argument   | Type       | Description                                                          |
| ---------- | ---------- | -------------------------------------------------------------------- |
| `path`     | str / Path | The file path or `"-"` to read from stdin.                           |
| `skip`     | bool       | Skip broken lines and don't raise `ValueError`. Defaults to `False`. |
| **YIELDS** | -          | The loaded JSON contents of each line.                               |

#### <kbd>function</kbd> `srsly.is_json_serializable`

Check if a Python object is JSON-serializable.

```python
assert srsly.is_json_serializable({"hello": "world"}) is True
assert srsly.is_json_serializable(lambda x: x) is False
```

| Argument    | Type | Description                              |
| ----------- | ---- | ---------------------------------------- |
| `obj`       | -    | The object to check.                     |
| **RETURNS** | bool | Whether the object is JSON-serializable. |

### msgpack

#### <kbd>function</kbd> `srsly.msgpack_dumps`

Serialize an object to a msgpack byte string.

```python
data = {"foo": "bar", "baz": 123}
msg = srsly.msgpack_dumps(data)
```

| Argument    | Type  | Description            |
| ----------- | ----- | ---------------------- |
| `data`      | -     | The data to serialize. |
| **RETURNS** | bytes | The serialized bytes.  |

#### <kbd>function</kbd> `srsly.msgpack_loads`

Deserialize msgpack bytes to a Python object.

```python
msg = b"\x82\xa3foo\xa3bar\xa3baz{"
data = srsly.msgpack_loads(msg)
```

| Argument    | Type  | Description                                                                             |
| ----------- | ----- | --------------------------------------------------------------------------------------- |
| `data`      | bytes | The data to deserialize.                                                                |
| `use_list`  | bool  | Don't use tuples instead of lists. Can make deserialization slower. Defaults to `True`. |
| **RETURNS** | -     | The deserialized Python object.                                                         |

#### <kbd>function</kbd> `srsly.write_msgpack`

Create a msgpack file and dump contents.

```python
data = {"foo": "bar", "baz": 123}
srsly.write_msgpack("/path/to/file.msg", data)
```

| Argument | Type         | Description            |
| -------- | ------------ | ---------------------- |
| `path`   | str / `Path` | The file path.         |
| `data`   | -            | The data to serialize. |

#### <kbd>function</kbd> `srsly.read_msgpack`

Load a msgpack file.

```python
data = srsly.read_msgpack("/path/to/file.msg")
```

| Argument    | Type         | Description                                                                             |
| ----------- | ------------ | --------------------------------------------------------------------------------------- |
| `path`      | str / `Path` | The file path.                                                                          |
| `use_list`  | bool         | Don't use tuples instead of lists. Can make deserialization slower. Defaults to `True`. |
| **RETURNS** | -            | The loaded and deserialized content.                                                    |

### pickle

#### <kbd>function</kbd> `srsly.pickle_dumps`

Serialize a Python object with pickle.

```python
data = {"foo": "bar", "baz": 123}
pickled_data = srsly.pickle_dumps(data)
```

| Argument    | Type  | Description                                            |
| ----------- | ----- | ------------------------------------------------------ |
| `data`      | -     | The object to serialize.                               |
| `protocol`  | int   | Protocol to use. `-1` for highest. Defaults to `None`. |
| **RETURNS** | bytes | The serialized object.                                 |

#### <kbd>function</kbd> `srsly.pickle_loads`

Deserialize bytes with pickle.

```python
pickled_data = b"\x80\x04\x95\x19\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03foo\x94\x8c\x03bar\x94\x8c\x03baz\x94K{u."
data = srsly.pickle_loads(pickled_data)
```

| Argument    | Type  | Description                     |
| ----------- | ----- | ------------------------------- |
| `data`      | bytes | The data to deserialize.        |
| **RETURNS** | -     | The deserialized Python object. |

### YAML

#### <kbd>function</kbd> `srsly.yaml_dumps`

Serialize an object to a YAML string. See the
[`ruamel.yaml` docs](https://yaml.readthedocs.io/en/latest/detail.html?highlight=indentation#indentation-of-block-sequences)
for details on the indentation format.

```python
data = {"foo": "bar", "baz": 123}
yaml_string = srsly.yaml_dumps(data)
```

| Argument          | Type | Description                                |
| ----------------- | ---- | ------------------------------------------ |
| `data`            | -    | The JSON-serializable data to output.      |
| `indent_mapping`  | int  | Mapping indentation. Defaults to `2`.      |
| `indent_sequence` | int  | Sequence indentation. Defaults to `4`.     |
| `indent_offset`   | int  | Indentation offset. Defaults to `2`.       |
| `sort_keys`       | bool | Sort dictionary keys. Defaults to `False`. |
| **RETURNS**       | str  | The serialized string.                     |

#### <kbd>function</kbd> `srsly.yaml_loads`

Deserialize unicode or a file object to a Python object.

```python
data = 'foo: bar\nbaz: 123'
obj = srsly.yaml_loads(data)
```

| Argument    | Type       | Description                     |
| ----------- | ---------- | ------------------------------- |
| `data`      | str / file | The data to deserialize.        |
| **RETURNS** | -          | The deserialized Python object. |

#### <kbd>function</kbd> `srsly.write_yaml`

Create a YAML file and dump contents or write to standard output.

```python
data = {"foo": "bar", "baz": 123}
srsly.write_yaml("/path/to/file.yml", data)
```

| Argument          | Type         | Description                                |
| ----------------- | ------------ | ------------------------------------------ |
| `path`            | str / `Path` | The file path or `"-"` to write to stdout. |
| `data`            | -            | The JSON-serializable data to output.      |
| `indent_mapping`  | int          | Mapping indentation. Defaults to `2`.      |
| `indent_sequence` | int          | Sequence indentation. Defaults to `4`.     |
| `indent_offset`   | int          | Indentation offset. Defaults to `2`.       |
| `sort_keys`       | bool         | Sort dictionary keys. Defaults to `False`. |

#### <kbd>function</kbd> `srsly.read_yaml`

Load YAML from a file or standard input.

```python
data = srsly.read_yaml("/path/to/file.yml")
```

| Argument    | Type         | Description                                |
| ----------- | ------------ | ------------------------------------------ |
| `path`      | str / `Path` | The file path or `"-"` to read from stdin. |
| **RETURNS** | dict / list  | The loaded YAML content.                   |

#### <kbd>function</kbd> `srsly.is_yaml_serializable`

Check if a Python object is YAML-serializable.

```python
assert srsly.is_yaml_serializable({"hello": "world"}) is True
assert srsly.is_yaml_serializable(lambda x: x) is False
```

| Argument    | Type | Description                              |
| ----------- | ---- | ---------------------------------------- |
| `obj`       | -    | The object to check.                     |
| **RETURNS** | bool | Whether the object is YAML-serializable. |
