Metadata-Version: 2.1
Name: mlconjug3
Version: 3.7.15
Summary: A Python library to conjugate French, English, Spanish, Italian, Portuguese and Romanian verbs using Machine Learning techniques.
Home-page: https://github.com/SekouDiaoNlp/mlconjug3
Author: SekouDiaoNlp
Author-email: diao.sekou.nlp@gmail.com
License: MIT license
Keywords: mlconjug3 conjugate conjugator conjugation conjugaison conjugación coniugazione conjugação conjugare verbs verbes verbos ML machine-learning NLP linguistics linguistique linguistica conjug_manager sklearnscikit-learn
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Education
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Utilities
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Natural Language :: French
Classifier: Natural Language :: Spanish
Classifier: Natural Language :: Italian
Classifier: Natural Language :: Portuguese
Classifier: Natural Language :: Romanian
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: defusedxml
Requires-Dist: cython
Requires-Dist: Click (>=7.1)
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn (>=0.21.4)
Requires-Dist: colorama
Requires-Dist: joblib

.. image:: https://raw.githubusercontent.com/SekouDiaoNlp/mlconjug3/master/logo/logotype2%20mlconjug.png
        :target: https://pypi.python.org/pypi/mlconjug3
        :alt: mlconjug3 PyPi Home Page

=========
MLCONJUG3
=========


.. image:: https://img.shields.io/pypi/v/mlconjug3.svg
        :target: https://pypi.python.org/pypi/mlconjug3
        :alt: Pypi Python Package Index Status

.. image:: https://pyup.io/repos/github/SekouDiaoNlp/mlconjug3/python-3-shield.svg
     :target: https://pyup.io/repos/github/SekouDiaoNlp/mlconjug3/
     :alt: Python 3

.. image:: https://img.shields.io/travis/SekouDiaoNlp/mlconjug3.svg
        :target: https://travis-ci.org/SekouDiaoNLP/mlconjug3
        :alt: Linux Continuous Integration Status

.. image:: https://ci.appveyor.com/api/projects/status/6iatj101xxfehbo8/branch/master?svg=true
        :target: https://ci.appveyor.com/project/SekouDiaoNlp/mlconjug3
        :alt: Windows Continuous Integration Status

.. image:: https://readthedocs.org/projects/mlconjug3/badge/?version=latest
        :target: https://mlconjug3.readthedocs.io/en/latest
        :alt: Documentation Status

.. image:: https://pyup.io/repos/github/SekouDiaoNlp/mlconjug3/shield.svg
     :target: https://pyup.io/repos/github/SekouDiaoNlp/mlconjug3/
     :alt: Dependencies status

.. image:: https://codecov.io/gh/SekouDiaoNlp/mlconjug3/branch/master/graph/badge.svg
        :target: https://codecov.io/gh/SekouDiaoNlp/mlconjug3
        :alt: Code Coverage Status

.. image:: https://snyk.io/test/github/SekouDiaoNlp/mlconjug3/badge.svg?targetFile=requirements.txt
        :target: https://snyk.io/test/github/SekouDiaoNlp/mlconjug3?targetFile=requirements.txt
        :alt: Code Vulnerability Status



| A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon)
    using Machine Learning techniques.
| Any verb in one of the supported language can be conjugated, as the module contains a Machine Learning model of how the verbs behave.
| Even completely new or made-up verbs can be successfully conjugated in this manner.
| The supplied pre-trained models are composed of:

- a binary feature extractor,
- a feature selector using Linear Support Vector Classification,
- a classifier using Stochastic Gradient Descent.

| MLConjug3 uses scikit-learn to implement the Machine Learning algorithms.
| Users of the library can use any compatible classifiers from scikit-learn to modify and retrain the models.

| The training data for the french model is based on Verbiste https://perso.b2b2c.ca/~sarrazip/dev/verbiste.html .
| The training data for English, Spanish, Italian, Portuguese and Romanian was generated using unsupervised learning techniques
  using the French model as a model to query during the training.

.. warning::
    MLCONJUG3 now only supports Python 3.x as Python 2.x has been deprecated in 2020.

* Free software: MIT license
* Documentation: https://mlconjug3.readthedocs.io.


Supported Languages
-------------------

- French
- English
- Spanish
- Italian
- Portuguese
- Romanian


Features
--------

- Easy to use API.
- Includes pre-trained models with 99% + accuracy in predicting conjugation class of unknown verbs.
- Easily train new models or add new languages.
- Easily integrate MLConjug in your own projects.
- Can be used as a command line tool.


Academic publications citing mlconjug
-------------------------------------

- | Ali Malik and Mike Wu and Vrinda Vasavada and Jinpeng Song and John Mitchell and Noah D. Goodman and Chris Piech.
  | "`Generative Grading Neural Approximate Parsing for Automated Student Feedback`_".
  | Proceedings of the 34th AAAI conference on Artificial Intelligence, 2019.

Software projects using mlconjug
--------------------------------

- | `Gender Bias Visualization`_
  | This project offers tools to visualize the gender bias in pre-trained language models to better understand the prejudices in the data.
- | `Text Adaptation To Context`_
  | This project uses language models to generate text that is well suited to the type of publication.
- | `Facemask Detection`_
  | This project offers a model which recognizes covid-19 masks.
- | `Bad Excuses for Zoom Abuses`_
  | Need an excuse for why you can't show up in your Zoom lectures? Just generate one here!
- | NLP_
  | Repository to store Natural Language Processing models.
- | `Virtual Assistant`_
  | This is a simple virtual assistant. With it, you can search the Internet, access websites, open programs, and more using just your voice.
  | This virtual assistant supports the English and Portuguese languages and has many settings that you can adjust to your liking.
- | `Bad Advice`_
  | This python module responds to yes or no questions. It dishes out its advice at random.
  | Disclaimer: Do not actually act on this advice ;)
- | `Spanish Conjugations Quiz`_
  | Python+Flask web app that uses mlconjug to dynamically generate foreign language conjugation questions.
- | `Silver Rogue DF`_
  | A dwarf-fortress adventure mode-inspired rogue-like Pygame Python3 game

BibTeX
------

If you want to cite mlconjug3 in an academic publication use this citation format:

.. code:: bibtex

   @article{mlconjug3,
     title={mlconjug3},
     author={Sekou Diao},
     journal={GitHub. Note: https://github.com/SekouDiaoNlp/mlconjug3 Cited by},
     year={2020}
   }


Credits
-------

This package was created with the help of Verbiste_ and scikit-learn_.

The logo was designed by Zuur_.

.. _Verbiste: https://perso.b2b2c.ca/~sarrazip/dev/verbiste.html
.. _scikit-learn: http://scikit-learn.org/stable/index.html
.. _Zuur: https://github.com/zuuritaly
.. _`Generative Grading Neural Approximate Parsing for Automated Student Feedback`: https://arxiv.org/abs/1905.09916
.. _`Gender Bias Visualization`: https://github.com/GesaJo/Gender-Bias-Visualization
.. _`Text Adaptation To Context`: https://github.com/lzontar/Text_Adaptation_To_Context
.. _`Facemask Detection`: https://github.com/samuel-karanja/facemask-derection
.. _`Bad Excuses for Zoom Abuses`: https://github.com/tyxchen/bad-excuses-for-zoom-abuses
.. _NLP: https://github.com/pskshyam/NLP
.. _`Virtual Assistant`: https://github.com/JeanExtreme002/Virtual-Assistant
.. _`Bad Advice`: https://github.com/matthew-cheney/bad-advice
.. _`Spanish Conjugations Quiz`: https://github.com/williammortimer/Spanish-Conjugations-Quiz
.. _`Silver Rogue DF`: https://github.com/FranchuFranchu/silver-rogue-df


============
Installation
============


Stable release
--------------

To install MLConjug3, run this command in your terminal:

.. code-block:: console

    $ pip install mlconjug3

This is the preferred method to install MLConjug3, as it will always install the most recent stable release.

If you don't have `pip`_ installed, this `Python installation guide`_ can guide
you through the process.


You can also install mlconjug3 by using Anaconda_ or Miniconda_.
To install Anaconda_ or Miniconda_, please follow the installation instructions on their respective websites.
After having installed Anaconda_ or Miniconda_, run this command in your terminal:

.. code-block:: console

    $ conda config --add channels conda-forge
    $ conda config --set channel_priority strict
    $ conda install mlconjug3

.. warning::
    If you intend to install mlconjug3 on a Apple Macbook with an Apple M1 processor,
    it is advised that you install mlconjug3 by using the conda installation method as all dependencies will be pre-compiled.

.. _pip: https://pip.pypa.io
.. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/
.. _Anaconda: https://www.anaconda.com/products/individual
.. _Miniconda: https://docs.conda.io/en/latest/miniconda.html


From sources
------------

The sources for MLConjug3 can be downloaded from the `Github repo`_.

You can either clone the public repository:

.. code-block:: console

    $ git clone git://github.com/SekouDiaoNlp/mlconjug3

Or download the `tarball`_:

.. code-block:: console

    $ curl  -OL https://github.com/SekouDiaoNlp/mlconjug3/tarball/master

Once you have a copy of the source, you can install it with:

.. code-block:: console

    $ python setup.py install


.. _Github repo: https://github.com/SekouDiaoNlp/mlconjug3
.. _tarball: https://github.com/SekouDiaoNlp/mlconjug3/tarball/master


=====
Usage
=====

.. NOTE:: The default language is French.
    When called without specifying a language, the library will try to conjugate the verb in French.


To use MLConjug3 from the command line::

    $ mlconjug3 manger

    $ mlconjug3 bring -l en

    $ mlconjug3 gallofar --language es

    $ mlconjug3 -o, --output (Path of the filename for storing the conjugation tables.)

    $ mlconjug3 -s, --subject (The subject format type for the conjugated forms). The
                       values can be 'abbrev' or 'pronoun'. The default value
                       is 'abbrev'.

    $ mlconjug3 -h Show the help menu


To use MLConjug3 in a project with the provided pre-trained conjugation models:

.. code-block:: python

    import mlconjug3

    # To use mlconjug3 with the default parameters and a pre-trained conjugation model.
    default_conjugator = mlconjug3.Conjugator(language='fr')

    # Verify that the model works
    test1 = default_conjugator.conjugate("manger").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    test2 = default_conjugator.conjugate("partir").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    test3 = default_conjugator.conjugate("facebooker").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    test4 = default_conjugator.conjugate("astigratir").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    test5 = default_conjugator.conjugate("mythoner").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    print(test1)
    print(test2)
    print(test3)
    print(test4)
    print(test5)

    # You can now iterate over all conjugated forms of a verb by using the newly added Verb.iterate() method.
    default_conjugator = mlconjug3.Conjugator(language='en')
    test_verb = default_conjugator.conjugate("be")
    all_conjugated_forms = test_verb.iterate()
    print(all_conjugated_forms)

To use MLConjug3 in a project and train a new model:

.. code-block:: python

    # Set a language to train the Conjugator on
    lang = 'fr'

    # Set a ngram range sliding window for the vectorizer
    ngrange = (2,7)

    # Transforms dataset with CountVectorizer. We pass the function extract_verb_features to the CountVectorizer.
    vectorizer = mlconjug3.CountVectorizer(analyzer=partial(mlconjug3.extract_verb_features, lang=lang, ngram_range=ngrange),
                                 binary=True)

    # Feature reduction
    feature_reductor = mlconjug3.SelectFromModel(mlconjug3.LinearSVC(penalty="l1", max_iter=12000, dual=False, verbose=0))

    # Prediction Classifier
    classifier = mlconjug3.SGDClassifier(loss="log", penalty='elasticnet', l1_ratio=0.15, max_iter=4000, alpha=1e-5, random_state=42, verbose=0)

    # Initialize Data Set
    dataset = mlconjug3.DataSet(mlconjug3.Verbiste(language=lang).verbs)
    dataset.construct_dict_conjug()
    dataset.split_data(proportion=0.9)

    # Initialize Conjugator
    model = mlconjug3.Model(vectorizer, feature_reductor, classifier)
    conjugator = mlconjug3.Conjugator(lang, model)

    #Training and prediction
    conjugator.model.train(dataset.train_input, dataset.train_labels)
    predicted = conjugator.model.predict(dataset.test_input)

    # Assess the performance of the model's predictions
    score = len([a == b for a, b in zip(predicted, dataset.test_labels) if a == b]) / len(predicted)
    print('The score of the model is {0}'.format(score))

    # Verify that the model works
    test1 = conjugator.conjugate("manger").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    test2 = conjugator.conjugate("partir").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    test3 = conjugator.conjugate("facebooker").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    test4 = conjugator.conjugate("astigratir").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    test5 = conjugator.conjugate("mythoner").conjug_info['Indicatif']['PassÃ© Simple']['1p']
    print(test1)
    print(test2)
    print(test3)
    print(test4)
    print(test5)

    # Save trained model
    with open('path/to/save/data/trained_model-fr.pickle', 'wb') as file:
        pickle.dump(conjugator.model, file)




=======
History
=======

3.7.15 (2021-15-04)
-------------------

* Updated documentation.
* Updated dependencies.

3.7.14 (2021-14-04)
-------------------

* Updated documentation.
* Retrained all models with scikit-learn 0.24.1
* Updated dependencies.

3.7.13 (2020-14-10)
-------------------

* Updated documentation.
* Fixed issue#89.
* Added more examples
* Updated dependencies.

3.7.12 (2020-08-10)
-------------------

* Updated documentation.
* Added code highnliting for examples.
* Added more examples
* Updated dependencies.

3.7.11 (2020-21-09)
-------------------

* Updated documentation.
* Updated dependencies.

3.7.10 (2020-12-09)
-------------------

* Fixed errors in English training corpus.
* Retrained English model.
* Updated dependencies.

3.7.9 (2020-30-08)
------------------

* Added Bibtex entry for easier citation in academic publication.

3.7.8 (2020-26-08)
------------------

* Fixed issue #79: Repeated person keys in English present continuous.
* Now the 'person' key of the conjugated forms dictionary can be consistently accessed by [person] for all moods and tenses for a more consistent API.

3.7.7 (2020-24-08)
------------------

* Fixed issue #65 : Infinitive inserted before some conjugated English verbs.
* Fixed issue #66 : Some spanish verbs were not conjugated correctly.
* Retrained all models with scikit-learn 0.23.2.
* Updated dependencies.
* Optimized code to train and predict faster.

3.7.6 (2020-17-05)
------------------

* Fixed issue #47 and #48 where some English and Spanish verbs were not conjugated correctly.
* Fixed issue #50 dealing with some spurious data for Spanish.
* Updated dependencies.

3.7.5 (2020-03-05)
------------------

* Updated the documentation.

3.7.4 (2020-03-05)
------------------

* Fixed issue #44 where Spanish gerunds were not conjugated properly.
* Updated dependencies.

3.7.3 (2020-30-04)
------------------

* Updated the documentation.

3.7.2 (2020-30-04)
------------------

* Fixed issue with package renaming.
* Fixed bug with Portuguese verbs ending in 'ar'.
* Retrained all models with scikit-learn 0.22.2.

3.7.1 (2020-29-01)
------------------

* Updated the pre-trained models for better accuracy (Now all models have more than 99.9% accuracy) .
* Added new utilities for model training and persistence.
* Now all training and GridSearch results are reproducible from run to run.
* Retrained all models with scikit-learn 0.22.1.
* Corrected mutliple edge cases and enlarged the test suite.

3.6.1 (2019-28-11)
------------------

* Updated the pre-trained models for better accuracy (Now all models have more than 99.9% accuracy) .
* Added new utilities for model training and persistence.
* Now all training and GridSearch results are reproducible from run to run.
* Updated development dependencies.

3.6.0 (2019-14-11)
------------------

* Updated scikit-learn dependency to 0.21.3.
* Updated other dependencies.

3.5.1 (2019-18-07)
------------------

* Fixed bug in issue #80 and #81 reported by @rongybika and @NoelHVincent.
* Added new option '-o' to the CLI allowing to specify output file to save results to json file.
* Use logging instead of print() whenever appropriate.
* Use joblib for model persistence instead.
* Updated Type declarations.
* Added more tests in the test-suite.
* Implemented results_parser to select and train the best performing models.
* Implemented multicore grid search.
* Display prettier output in the CLI.
* Updated scikit-learn dependency.
* Updated other dependencies.

3.4 (2019-29-04)
------------------

* Fixed bug when verbs with no common roots with their conjugated form get their root inserted as a prefix.
* Added the method iterate() to the Verb Class as per @poolebu's feature request.
* Updated Dependencies.

3.3.2 (2019-06-04)
------------------

* Corrected bug with regular english verbs not being properly regulated. Thanks to @vectomon
* Updated Dependencies.

3.3.1 (2019-02-04)
------------------

* Corrected bug when updating dependencies to use scikit-learn v 0.20.2 and higher.
* Updated Dependencies.

3.3 (2019-04-03)
------------------

* Updated Dependencies to use scikit-learn v 0.20.2 and higher.
* Updated the pre-trained models to use scikit-learn v 0.20.2 and higher.

3.2.3 (2019-26-02)
------------------

* Updated Dependencies.
* Fixed bug which prevented the installation of the pre-trained models.

3.2.2 (2018-18-11)
------------------

* Updated Dependencies.

3.2.0 (2018-04-11)
------------------

* Updated Dependencies.

3.1.3 (2018-07-10)
------------------

* Updated Documentation.
* Added support for pipenv.
* Included tests and documentation in the package distribution.


3.1.2 (2018-06-27)
------------------

* Updated `Type annotations`_ to the whole library for PEP-561 compliance.


3.1.1 (2018-06-26)
------------------

* Minor Api enhancement (see `API documentation`_)


3.1.0 (2018-06-24)
------------------

* Updated the conjugation models for Spanish and Portuguese.
* Internal changes to the format of the verbiste data from xml to json for better handling of unicode characters.
* New class ConjugManager to more easily add new languages to mlconjug3.
* Minor Api enhancement (see `API documentation`_)


3.0.1 (2018-06-22)
------------------

* Updated all provided pre-trained prediction models:
    - Implemented a new vectrorizer extracting more meaningful features.
    - As a result the performance of the models has gone through the roof in all languages.
    - Recall and Precision are intesimally close to 100 %. English being the anly to achieve a perfect score at both Recall and Precision.

* Major API changes:
    - I removed the class EndingCustomVectorizer and refactored it's functionnality in a top level function called extract_verb_features()
    - The provided new improved model are now being zip compressed before release because the feature space has so much grown that their size made them impractical to distribute with the package.
    - Renamed "Model.model" to "Model.pipeline"
    - Renamed "DataSet.liste_verbes" and "DataSet.liste_templates" to "DataSet.verbs_list" and "DataSet.templates_list" respectively. (Pardon my french ;-) )
    - Added the attributes "predicted" and "confidence_score" to the class Verb.
    - The whole package have been typed check. I will soon add mlconjug3's type stubs to typeshed.


2.1.11 (2018-06-21)
-------------------

* Updated all provided pre-trained prediction models
    - The French Conjugator has accuracy of about 99.94% in predicting the correct conjugation class of a French verb. This is the baseline as i have been working on it for some time now.
    - The English Conjugator has accuracy of about 99.78% in predicting the correct conjugation class of an English verb. This is one of the biggest improvement since version 2.0.0
    - The Spanish Conjugator has accuracy of about 99.65% in predicting the correct conjugation class of a Spanish verb. It has also seen a sizable improvement since version 2.0.0
    - The Romanian Conjugator has accuracy of about 99.06% in predicting the correct conjugation class of a Romanian verb.This is by far the bigger gain. I modified the vectorizer to better take into account the morphological features or romanian verbs. (the previous score was about 86%, so it wil be nice for our romanian friends to have a trusted conjugator)
    - The Portuguese Conjugator has accuracy of about 96.73% in predicting the correct conjugation class of a Portuguese verb.
    - The Italian Conjugator has accuracy of about 94.05% in predicting the correct conjugation class of a Italian verb.


2.1.9 (2018-06-21)
------------------

* Now the Conjugator adds additional information to the Verb object returned.
    - If the verb under consideration is already in Verbiste, the conjugation for the verb is retrieved directly from memory.
    - If the verb under consideration is unknown in Verbiste, the Conjugator class now sets the boolean attribute 'predicted' and the float attribute confidence score to the instance of the Verb object the Conjugator.conjugate(verb) returns.
* Added `Type annotations`_ to the whole library for robustness and ease of scaling-out.
* The performance of the Engish and Romanian Models have improved significantly lately. I guess in a few more iteration they will be on par with the French Model which is the best performing at the moment as i have been tuning its parameters for a caouple of year now. Not so much with the other languages, but if you update regularly you will see nice improvents in the 2.2 release.
* Enhanced the localization of the program.
* Now the user interface of mlconjug3 is avalaible in French, Spanish, Italian, Portuguese and Romanian, in addition to English.
* `All the documentation of the project`_ have been translated in the supported languages.


.. _Type annotations: https://github.com/python/typeshed
.. _All the documentation of the project: https://mlconjug3.readthedocs.io/en/latest/
.. _API documentation: https://mlconjug3.readthedocs.io/en/latest/modules.html


2.1.5 (2018-06-15)
------------------

* Added localization.
* Now the user interface of mlconjug3 is avalaible in French, Spanish, Italian, Portuguese and Romanian, in addition to English.


2.1.2 (2018-06-15)
------------------

* Added invalid verb detection.


2.1.0 (2018-06-15)
------------------

* Updated all language models for compatibility with scikit-learn 0.19.1.


2.0.0 (2018-06-14)
------------------

* Includes English conjugation model.
* Includes Spanish conjugation model.
* Includes Italian conjugation model.
* Includes Portuguese conjugation model.
* Includes Romanian conjugation model.


1.2.0 (2018-06-12)
------------------

* Refactored the API. Now a Single class Conjugator is needed to interface with the module.
* Includes improved french conjugation model.
* Added support for multiple languages.


1.1.0 (2018-06-11)
------------------

* Refactored the API. Now a Single class Conjugator is needed to interface with the module.
* Includes improved french conjugation model.


1.0.0 (2018-06-10)
------------------

* First release on PyPI.






