Metadata-Version: 2.1
Name: cld2-cffi
Version: 0.1.4
Summary: CFFI bindings around Google Chromium's embedded compact language detection library (CLD2)
Home-page: http://github.com/GregBowyer/cld2-cffi/
Author: Michael McCandless & Greg Bowyer
Author-email: mail@mikemccandless.com & gbowyer@fastmail.co.uk
License: Apache2
Keywords: cld2,cffi
Platform: UNKNOWN
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: C++
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing :: Linguistic

CLD2-CFFI - Python (CFFI) Bindings for Compact Language Detector 2
==================================================================

`CFFI <cffi.readthedocs.org>`_ bindings for CLD2

-----

|pypi| |build| |win-build| |coverage| |lint|

-----


This package contains the CLD (Compact Language Detection) library as
maintained by Dick Sites (https://code.google.com/p/cld2/). The first
fork was done at revision r161. It also contains python bindings that
were originally created by `Mike
McCandless <http://code.google.com/p/chromium-compact-language-detector>`_.
The bindings have gone through several hands, with the latest changes being made
to rework the bindings for `CFFI <cffi.readthedocs.org>`_.

These bindings are identical in API to the original cld2 bindings, and as a
result can be used as a drop in replacement.

The LICENSE_ is the same as Chromium's LICENSE and is included in the
LICENSE_ file for reference.

==========
Installing
==========

Should be as simple as

.. code-block:: bash

   $ pip install cld2-cffi

-------------------
Development Version
-------------------

The **latest development version** can be installed directly from GitHub:

.. code-block:: bash

    $ pip install --upgrade 'git+https://github.com/GregBowyer/cld2-cffi.git'

=====
Usage
=====

.. code-block:: python

    import cld2

    isReliable, textBytesFound, details = cld2.detect("This is my sample text")
    print('  reliable: %s' % (isReliable != 0))
    print('  textBytes: %s' % textBytesFound)
    print('  details: %s' % str(details))

    # The output looks like so:
    #  reliable: True
    #  textBytes: 24
    #  details: (('ENGLISH', 'en', 95, 1736.0), ('Unknown', 'un', 0, 0.0), ('Unknown', 'un', 0, 0.0))

=============
Documentation
=============

First, you must get your content (plain text or HTML) encoded into UTF8
bytes. Then, detect like this:

.. code-block:: python

    isReliable, textBytesFound, details = cld2.detect(bytes)

``isReliable`` 
    is True if the top language is much better than 2nd best language.

``textBytesFound`` 
    tells you how many actual bytes CLD analyzed (after removing HTML tags,
    collapsing areas of too-many-spaces, etc.).  

``details`` 
    has an entry per top 3 languages that matched, that includes the percent
    confidence of the match as well as a separate normalized score.

The module exports these global constants:

``cld2.ENCODINGS``
    list of the encoding names CLD recognizes (if you provide hintEncoding, it
    must be one of these names).

``cld2.LANGUAGES``
    list of languages and their codes (if you provide hintLanguageCode, it must
    be one of the codes from these codes).

``cld2.EXTERNAL_LANGUAGES``
    list of external languages and their codes. Note that external languages
    cannot be hinted, but may be matched if you pass
    ``includeExtendedLanguages=True`` (the default).

``cld2.DETECTED_LANGUAGES``
    list of all detectable languages, as best I can determine (this was reverse
    engineered from a unit test, ie it contains a language X if that language
    was tested and passes for at least one example text).


=======
Authors
=======

Please see `AUTHORS <https://github.com/GregBowyer/cld2-cffi/blob/master/BUG_REPORTS.rst>`_.


==============
Reporting bugs
==============
Please see `BUG_REPORTS <https://github.com/GregBowyer/cld2-cffi/blob/master/BUG_REPORTS.rst>`_.


==========
Contribute
==========

Please see `CONTRIBUTING <https://github.com/GregBowyer/cld2-cffi/blob/master/CONTRIBUTING.rst>`_.


=======
Licence
=======

Please see LICENSE_.

.. _LICENSE: https://github.com/GregBowyer/cld2-cffi/blob/master/LICENSE

.. |pypi| image:: https://img.shields.io/pypi/v/cld2-cffi.svg?style=flat-square&label=latest%20version
    :target: https://pypi.python.org/pypi/cld2-cffi
    :alt: Latest version released on PyPi

.. |build| image:: https://img.shields.io/travis/GregBowyer/cld2-cffi/master.svg?style=flat-square&label=OSX%20Linux%20build
    :target: http://travis-ci.org/GregBowyer/cld2-cffi
    :alt: Build status 

.. |win-build| image:: https://img.shields.io/appveyor/ci/GregBowyer/cld2-cffi.svg?maxAge=2592000&style=flat-square&label=Windows%20Build
    :target: https://ci.appveyor.com/project/GregBowyer/cld2-cffi
    :alt: Windows Build Status::

.. |coverage| image:: https://img.shields.io/codecov/c/github/GregBowyer/cld2-cffi.svg?style=flat-square
    :target: https://codecov.io/github/GregBowyer/cld2-cffi
    :alt: Coverage

.. |lint| image:: https://landscape.io/github/GregBowyer/cld2-cffi/master/landscape.svg?style=flat-square
   :target: https://landscape.io/github/GregBowyer/cld2-cffi/master
   :alt: Code Health


