Metadata-Version: 2.1
Name: pandas-streaming
Version: 0.2.174
Summary: Streaming operations with pandas.
Home-page: http://www.xavierdupre.fr/app/pandas_streaming/helpsphinx/index.html
Download-URL: https://github.com/sdpython/pandas_streaming/
Author: Xavier Dupré
Author-email: xavier.dupre@gmail.com
License: MIT
Keywords: pandas_streaming,Xavier Dupré
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Education
Classifier: License :: OSI Approved :: MIT License
Classifier: Development Status :: 5 - Production/Stable
License-File: LICENSE.txt


.. image:: https://github.com/sdpython/pandas_streaming/blob/master/_doc/sphinxdoc/source/phdoc_static/project_ico.png?raw=true
    :target: https://github.com/sdpython/pandas_streaming/

.. _l-README:

pandas_streaming: streaming API over pandas
===========================================

.. image:: https://travis-ci.org/sdpython/pandas_streaming.svg?branch=master
    :target: https://travis-ci.org/sdpython/pandas_streaming
    :alt: Build status

.. image:: https://ci.appveyor.com/api/projects/status/4te066r8ne1ymmhy?svg=true
    :target: https://ci.appveyor.com/project/sdpython/pandas-streaming
    :alt: Build Status Windows

.. image:: https://circleci.com/gh/sdpython/pandas_streaming/tree/master.svg?style=svg
    :target: https://circleci.com/gh/sdpython/pandas_streaming/tree/master

.. image:: https://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streaming
    :target: https://dev.azure.com/xavierdupre3/pandas_streaming/

.. image:: https://badge.fury.io/py/pandas_streaming.svg
    :target: http://badge.fury.io/py/pandas_streaming

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
    :alt: MIT License
    :target: http://opensource.org/licenses/MIT

.. image:: https://requires.io/github/sdpython/pandas_streaming/requirements.svg?branch=master
     :target: https://requires.io/github/sdpython/pandas_streaming/requirements/?branch=master
     :alt: Requirements Status

.. image:: https://codecov.io/github/sdpython/pandas_streaming/coverage.svg?branch=master
    :target: https://codecov.io/github/sdpython/pandas_streaming?branch=master

.. image:: http://img.shields.io/github/issues/sdpython/pandas_streaming.png
    :alt: GitHub Issues
    :target: https://github.com/sdpython/pandas_streaming/issues

.. image:: http://www.xavierdupre.fr/app/pandas_streaming/helpsphinx/_images/nbcov.png
    :target: http://www.xavierdupre.fr/app/pandas_streaming/helpsphinx/all_notebooks_coverage.html
    :alt: Notebook Coverage

.. image:: https://api.codacy.com/project/badge/Grade/f53b7f4d6a0447aa9ce0c4ad5df659ef
    :target: https://www.codacy.com/app/sdpython/pandas_streaming?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=sdpython/pandas_streaming&amp;utm_campaign=Badge_Grade

.. image:: https://pepy.tech/badge/pandas_streaming/month
    :target: https://pepy.tech/project/pandas_streaming/month
    :alt: Downloads

.. image:: https://img.shields.io/github/forks/sdpython/pandas_streaming.svg
    :target: https://github.com/sdpython/pandas_streaming/
    :alt: Forks

.. image:: https://img.shields.io/github/stars/sdpython/pandas_streaming.svg
    :target: https://github.com/sdpython/pandas_streaming/
    :alt: Stars

.. image:: https://img.shields.io/github/repo-size/sdpython/pandas_streaming
    :target: https://github.com/sdpython/pandas_streaming/
    :alt: size

`pandas_streaming <http://www.xavierdupre.fr/app/pandas_streaming/helpsphinx/index.html>`_
aims at processing big files with `pandas <http://pandas.pydata.org/>`_,
too big to hold in memory, too small to be parallelized with a significant gain.
The module replicates a subset of `pandas <http://pandas.pydata.org/>`_ API
and implements other functionalities for machine learning.

::

    from pandas_streaming.df import StreamingDataFrame
    sdf = StreamingDataFrame.read_csv("filename", sep="\t", encoding="utf-8")

    for df in sdf:
        # process this chunk of data
        # df is a dataframe
        print(df)

The module can also stream an existing dataframe.

::

    import pandas
    df = pandas.DataFrame([dict(cf=0, cint=0, cstr="0"),
                           dict(cf=1, cint=1, cstr="1"),
                           dict(cf=3, cint=3, cstr="3")])

    from pandas_streaming.df import StreamingDataFrame
    sdf = StreamingDataFrame.read_df(df)

    for df in sdf:
        # process this chunk of data
        # df is a dataframe
        print(df)

It contains other helpers to split datasets into
train and test with some weird constraints.

**Links:**

* `GitHub/pandas_streaming <https://github.com/sdpython/pandas_streaming/>`_
* `documentation <http://www.xavierdupre.fr/app/pandas_streaming/helpsphinx/index.html>`_
* `Blog <http://www.xavierdupre.fr/app/pandas_streaming/helpsphinx/blog/main_0000.html#ap-main-0>`_

.. _l-HISTORY:

=======
History
=======

current - 2020-08-06 - 0.00Mb
=============================

0.0.0 - 2020-08-06 - 0.00Mb
===========================

* `16`: Unit tests failing with pandas 1.1.0. (2020-08-06)
* `15`: implements parameter lines, flatten for read_json (2018-11-21)
* `14`: implements fillna (2018-10-29)
* `13`: implement concat for axis=0,1 (2018-10-26)
* `12`: add groupby_streaming (2018-10-26)
* `11`: add method add_column (2018-10-26)
* `10`: plan B to bypass a bug in pandas about read_csv when iterator=True --> closed, pandas has a weird behaviour when names is too small compare to the number of columns (2018-10-26)
* `9`: head is very slow (2018-10-26)
* `8`: fix pandas_streaming for pandas 0.23.1 (2018-07-31)
* `7`: implement read_json (2018-05-17)
* `6`: add pandas_groupby_nan from pyensae (2018-05-17)
* `5`: add random_state parameter to splitting functions (2018-02-04)
* `2`: add method sample, resevoir sampling (2017-11-05)
* `3`: method train_test_split for out-of-memory datasets (2017-10-21)
* `1`: Excited for your project (2017-10-10)


