Metadata-Version: 2.1
Name: pylexique
Version: 1.4.0
Summary: Pylexique is a Python wrapper around Lexique83
Home-page: https://github.com/SekouDiaoNlp/pylexique
Author: SekouDiaoNlp
Author-email: diao.sekou.nlp@gmail.com
License: MIT license
Keywords: pylexique
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Education
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Utilities
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
License-File: LICENSE
License-File: AUTHORS.rst
Requires-Dist: colorama
Requires-Dist: tqdm
Requires-Dist: pyxlsb
Requires-Dist: pandas
Requires-Dist: Click (>=8.0.3)
Requires-Dist: setuptools (~=49.6.0)
Requires-Dist: joblib (~=1.0.1)
Requires-Dist: dataclasses (>=0.6) ; python_version < "3.7"

=========
pylexique
=========


.. image:: https://img.shields.io/badge/Maintained%3F-yes-green.svg
        :target: https://GitHub.com/SekouDiaoNlp/pylexique/graphs/commit-activity
        :alt: Package Maintenance Status

.. image:: https://img.shields.io/badge/maintainer-SekouDiaoNlp-blue
        :target: https://GitHub.com/SekouDiaoNlp/pylexique
        :alt: Package Maintener

.. image:: https://img.shields.io/github/checks-status/SekouDiaoNlp/mlconjug3/master?label=Build%20status%20on%20Windows%2C%20MacOs%20and%20Linux
        :target: https://github.com/SekouDiaoNlp/pylexique/actions/workflows/main.yml
        :alt: Build status on Windows, MacOs and Linux

.. image:: https://img.shields.io/pypi/v/pylexique.svg
        :target: https://pypi.python.org/pypi/pylexique
        :alt: Pypi Python Package Index Status

.. image:: https://anaconda.org/conda-forge/pylexique/badges/version.svg
        :target: https://anaconda.org/conda-forge/pylexique
        :alt: Anaconda Package Index Status

.. image:: https://img.shields.io/pypi/pyversions/pylexique
        :target: https://pypi.python.org/pypi/pylexique
        :alt: Compatible Python versions

.. image:: https://img.shields.io/conda/pn/conda-forge/pylexique?color=dark%20green&label=Supported%20platforms
        :target: https://anaconda.org/conda-forge/pylexique
        :alt: Supported platforms

.. image:: https://readthedocs.org/projects/pylexique/badge/?version=latest
        :target: https://pylexique.readthedocs.io/en/latest
        :alt: Documentation Status

.. image:: https://pyup.io/repos/github/SekouDiaoNlp/pylexique/shield.svg
        :target: https://pyup.io/repos/github/SekouDiaoNlp/pylexique/
        :alt: Dependencies status

.. image:: https://codecov.io/gh/SekouDiaoNlp/pylexique/branch/master/graph/badge.svg?token=EiEXyUJGpF
        :target: https://codecov.io/gh/SekouDiaoNlp/pylexique
        :alt: Code Coverage Status

.. image:: https://snyk-widget.herokuapp.com/badge/pip/pylexique/badge.svg
        :target: https://snyk.io/test/github/SekouDiaoNlp/pylexique?targetFile=requirements.txt
        :alt: Code Vulnerability Status

.. image:: https://img.shields.io/pypi/dm/pylexique?label=PyPi%20Downloads
        :target: https://pypi.org/project/pylexique/
        :alt: PyPI Downloads

.. image:: https://img.shields.io/conda/dn/conda-forge/pylexique?label=Anaconda%20Total%20Downloads
        :target: https://anaconda.org/conda-forge/pylexique
        :alt: Conda


|


| Pylexique is a Python wrapper around Lexique383_.

| It allows the extraction of lexical information from more than 140 000 French words in an Object Oriented way.

|

| Pylexique est une librairie Python encapsulant Lexique383_.

| Cette librairie permet d'extraire l'information lexicale de plus de 140 000 mots FranÃ§ais de maniÃ¨re OrientÃ©e Objet.

|
|

* Free software: MIT license
* PyLexique Documentation: https://pylexique.readthedocs.io

| Each lexical item is represented as a LexItem having the following LexEntryType:
|

| Chaque entrÃ©e lexicale est reprÃ©sentÃ©e par un LexItem de type LexEntryType:
|

.. code-block:: python

        class LexEntryType:
        """
        Type information about all the lexical attributes in a LexItem object.

        """
        ortho: str
        phon: str
        lemme: str
        cgram: str
        genre: str
        nombre: str
        freqlemfilms2: float
        freqlemlivres: float
        freqfilms2: float
        freqlivres: float
        infover: str
        nbhomogr: int
        nbhomoph: int
        islem: bool
        nblettres: int
        nbphons: int
        cvcv: str
        p_cvcv: str
        voisorth: int
        voisphon: int
        puorth: int
        puphon: int
        syll: str
        nbsyll: int
        cv_cv: str
        orthrenv: str
        phonrenv: str
        orthosyll: str
        cgramortho: str
        deflem: float
        defobs: int
        old20: float
        pld20: float
        morphoder: str
        nbmorph: int

The meanings of the attributes of this object are as follow:

* ortho: the word
* phon: the phonological forms of the word
* lemme: the lemmas of this word
* cgram: the grammatical categories of this word
* genre: the gender
* nombre: the number
* freqlemfilms: the frequency of the lemma according to the corpus of subtitles (per million occurrences)
* freqlemlivres: the frequency of the lemma according to the body of books (per million occurrences)
* freqfilms: the frequency of the word according to the corpus of subtitles (per million occurrences)
* freqlivres: the frequency of the word according to the body of books (per million occurrences)
* infover: modes, tenses, and possible people for verbs
* nbhomogr: number of homographs
* nbhomoph: number of homophones
* islem: indicates if it is a lemma or not
* nblettres: the number of letters
* nbphons: number of phonemes
* cvcv: the orthographic structure
* p-cvcv: the phonological structure
* voisorth: number of orthographic neighbors
* voisphon: number of phonological neighbors
* puorth: point of spelling uniqueness
* puphon: point of phonological uniqueness
* syll: syllable phonological form
* nbsyll: number of syllables
* cv-cv: syllable phonological structure
* orthrenv: reverse orthographic form
* phonrenv: reversed phonological form
* orthosyll: syllable orthographic form
* cgramortho: the different grammatical category for a given orthographic representation
* deflem: the percentage of people who said they knew the lemma of the word
* defobs: the size of the sample from which 'deflem' is derived
* old20:  orthographic Levenshtein Distance
* pld20: phonological Levenshtein Distance
* morphoder: inflectional morphology
* nbmorph: the number of morphemes directly computed from 'morphoder'


You can find all the relevant information in the `official documentation of Lexique383`_ (French).


Features
--------

* Extract all lexical information from a French  word such as:
    * orthographic and phonemics representations
    * associated lemmas
    * syllabation
    * grammatical category
    * gender and number
    * frequencies in a corpus of books and in a body of film subtitles, etc...
* Extract all the lexical forms of a French word.
* Easy to use Api.
* Easily integrate pylexique in your own projects as an imported library.
* Can be used as a command line tool.

Credits
-------

Main developer SekouDiaoNlp_.

Lexical corpus: Lexique383_

About Lexique383
================

Lexique3
========

Lexique 3.83 is a French lexical database that provides
for ~ 140,000 words of French: orthographic and phonemics representations,
associated lemmas, syllabation, grammatical category, gender and number,
frequencies in a corpus of books and in a body of film subtitles, etc...

|

Lexique 3.83 est une base de donnÃ©es lexicales du franÃ§ais qui fournit
pour ~140000 mots du franÃ§ais: les reprÃ©sentations orthographiques et
phonÃ©miques, les lemmes associÃ©s, la syllabation, la catÃ©gorie
grammaticale, le genre et le nombre, les frÃ©quences dans un corpus de
livres et dans un corpus de sous-titres de films, etc...

Table: `Lexique383.zip`_

Web site: http://www.lexique.org

Online: http://www.lexique.org/shiny/lexique

Publications
------------

-  New, Boris, Christophe Pallier, Marc Brysbaert, and Ludovic Ferrand.
   2004. "Lexique 2: A New French Lexical Database." *Behavior Research
   Methods, Instruments, & Computers* 36 (3): 516--524.
   `DOI <https://doi.org/10.3758/bf03195598>`__.
   `pdf`_

-  New, Boris, Christophe Pallier, Ludovic Ferrand, and Rafael Matos.
   2001. "Une Base de DonnÃ©es Lexicales Du FranÃ§ais Contemporain Sur
   Internet: LEXIQUE" *L'AnnÃ©e Psychologique* 101 (3): 447--462.
   `DOI <https://doi.org/10.1017/S014271640707035X>`__.
   `pdf <https://docs.google.com/viewer?url=http://www.lexique.org/outils/Lexique_Annee.pdf>`__

-  Boris New, Marc Brysbaert, Jean Veronis, and Christophe Pallier.
   2007. "The Use of Film Subtitles to Estimate Word Frequencies."
   Applied Psycholinguistics 28 (4): 661--77.
   `DOI <https://doi.org/10.1017/S014271640707035X>`__.
   (`pdf <https://drive.google.com/file/d/1uvKrxGqETXkFeRH4PcYaql8ES9FjEdbV/view?usp=sharing>`__)

Contributors
------------

-  Boris New & Christophe Pallier
-  Ronald Peereman
-  Sophie Dufour
-  Christian Lachaud
-  and many others... (contact us to be listed)

License
-------

`CC BY SA40.0`_

.. _Lexique383.zip: http://www.lexique.org/databases/Lexique383/Lexique383.zip
.. _pdf: https://docs.google.com/viewer?url=http://sites.google.com/site/borisnew/pub/New-et-al2004-BRMIC.pdf?attredirects=0
.. _CC BY SA40.0: LICENSE-CC-BY-SA4.0.txt


BibTex Entry to cite publications about Lexique383:


.. code:: bibtex

    @article{npbf04,
    author = {New, B. and Pallier, C. and Brysbaert, M. and Ferrand, L.},
    journal = {ehavior Research Methods, Instruments, & Computers},
    number = {3},
    pages = {516-524},
    title = {Lexique 2 : A New French Lexical Database},
    volume = {36},
    year = {2004},
    eprint = {http://www.lexique.org/?page_id=294},
    }

.. code:: bibtex

    @article{npfm01,
    author = {New, B. and Pallier, C. and Ferrand, L. and Matos, R.},
    journal = {L'Ann{\'e}e Pschologique},
    number = {447-462},
    pages = {1396-2},
    title = {Une base de donn{\'e}es lexicales du fran\c{c}ais contemporain sur internet: LEXIQUE},
    volume = {101},
    year = {2001},
    }

.. code:: bibtex

    @article{new_brysbaert_veronis_pallier_2007,
    author={NEW, BORIS and BRYSBAERT, MARC and VERONIS, JEAN and PALLIER, CHRISTOPHE},
    title={The use of film subtitles to estimate word frequencies},
    volume={28}, DOI={10.1017/S014271640707035X},
    number={4}, journal={Applied Psycholinguistics},
    publisher={Cambridge University Press},
    year={2007},
    pages={661â€“677}}

BibTeX
------

If you want to cite pylexique in an academic publication use this citation format:

.. code:: bibtex

   @article{pylexique,
     title={pylexique},
     author={Sekou Diao},
     journal={GitHub. Note: https://github.com/SekouDiaoNlp/pylexique Cited by},
     year={2021}
   }


.. _Lexique383: http://www.lexique.org/
.. _SekouDiaoNlp: https://github.com/SekouDiaoNlp
.. _`official documentation of Lexique383`: http://lexique.org/_documentation/Manuel_Lexique.3.2.pdf


=======
History
=======

1.4.0 (2021-10-23)
------------------

* The library is now completely type annotated and type-checked.
* The method get_all_forms() now checks if the word has multiple lemmas in case of homonymy, eg: 'souris'.
* Fixed parsing bug when using 'std-csv' parser.
* Fixed formatting issue when saving the output of the CLI to a file.
* Updated the Documentation and the docstrings.
* Updated dependencies.

1.3.5 (2021-05-18)
------------------

* Uses str.lower() to normalize inputs.
* Updated the Documentation and the docstrings.

1.3.4 (2021-05-16)
------------------

* Fixed bug where Lexique383 was not shipped with the distribution.
* Made csv parsing far faster and more robust.
* Can now use different parsers : 'pandas_csv' is the pandas csv parser, 'std_csv' is the standard library csv parser, 'csv' is a custom csv parser and 'xlsb' is pandas xlsb parser using pyxlsb engine.
* Updated dependencies.

1.3.3 (2021-05-16)
------------------

* Made csv parsing faster and more robust.
* Can now use different parsers : 'pandas_csv' is the pandas csv parser, 'std_csv' is the standard library csv parser, 'csv' is a custom csv parser and 'xlsb' is pandas xlsb parser using pyxlsb engine.
* Updated dependencies.

1.3.2 (2021-05-14)
------------------

* Can now use both 'csv' and 'xlsb' files.
* Uses 'csv' file for storage and faster load times.
* Updated dependencies.

1.3.1 (2021-05-12)
------------------

* Uses pandas for now for faster resource loading.
* Uses xlsb file for storage and faster load times
* Updated dependencies.

1.3.0 (2021-05-11)
------------------

* Uses pandas for now for faster resource loading.
* In the process of integrating `faster-than-csv` when MacOs issues get resolved.
* Refactored and expanded the test suite.
* Updated dependencies.

1.2.7 (2021-05-07)
------------------

* The new method Lexique383.get_all_forms(word) is now accessible through the cli with option '-a' or '--all_forms'.
* This new method returns a list of LexItems having the same root lemma.
* Added sample commands using the new option in the docs.
* Refactored and expanded the test suite.
* Updated dependencies.

1.2.6 (2021-05-06)
------------------

* allows for new style of relative imports.
* Now all the attributes of the LexItem objects are immutable for consistency.
* Added new method Lexique383.get_all_forms(word) to get all the lexical variations of a word.
* This new method returns a list of LexItems having the same root lemma.
* Expanded sample usage of the software in the docs.
* Updated dependencies.

1.2.3 (2021-05-04)
------------------

* Enhanced behaviour of output to stdout to not conflict with the logging strategy of users importing the library in their own projects.
* Expanded sample usage of the software in the docs.
* Updated dependencies.

1.2.2 (2021-05-04)
------------------

* Enhanced Type Hinting for main module.
* Changed the property LexItem.islem to boolean instead of a binary choice 0/1.
* Expanded sample usage of the software in the docs.
* Updated dependencies.

1.2.1 (2021-04-30)
------------------

* Implemented Type Hinting for main module.
* Added a new method to the class Lexique383. The method is Lexique383._save_errors() .
* This new method checks that the value of each field in a LexItem is of the right type. If it finds errors it will record the mismatched value/type and save it in ./erros/errors.json
* Expanded sample usage of the software in the docs.
* Much better documentation including links to Lexique383 pages and manuals.

1.2.0 (2021-04-30)
------------------

* Added a new method to the class Lexique383. The method is Lexique383.get_lex() .
* This new method accepts either a single word as a string or an iterable of strings and will return the asked lexical information.
* Expanded sample usage of the software in the docs.
* Substantial update to the code and docs.
* Removed unneeded dependencies as I reimplement some functionality myself.

1.1.1 (2021-04-28)
------------------

* Added a new method to the class LexItem. The method is LexItem.to_dict() .
* This new method allows the LexItem objects to be converted into dicts with key/value pairs corresponding to the LexItem.
* This method allows easy display or serialization of the LexItem objects.
* Lexical Items having the same orthography are stored in a list at the word's orthography key to the LEXIQUE dict.
* Expanded sample usage of the software in the docs.
* Substantial update to the code and docs.

1.1.0 (2021-04-28)
------------------

* Drastically reduced dependencies by ditching HDF5 and bolcs as the package is now smaller, faster an easier to build.
* Lexical Items having the same orthography are stored in a list at the word's orthography key to the LEXIQUE dict.
* Implemented the "FlyWheel" pattern for light Lexical entries rsiding entirely in memory at run time.
* Added sample usage of the software in the docs.
* General update to the code and docs.

1.0.7 (2021-04-27)
------------------

* First release on PyPI.


