Metadata-Version: 2.1
Name: pykakasi
Version: 0.0.0
Summary: Python implementation of kakasi - kana kanji simple inversion library
Home-page: https://github.com/miurahr/pykakasi
Author: Hiroshi Miura
Author-email: miurahr@linux.com
License: GPLv3
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/x-rst
Requires-Dist: klepto
Requires-Dist: importlib-metadata ; python_version < "3.8"
Provides-Extra: check
Requires-Dist: mypy (==0.770) ; extra == 'check'
Requires-Dist: mypy-extensions (==0.4.3) ; extra == 'check'
Requires-Dist: docutils ; extra == 'check'
Requires-Dist: check-manifest ; extra == 'check'
Requires-Dist: flake8 ; extra == 'check'
Requires-Dist: readme-renderer ; extra == 'check'
Requires-Dist: pygments ; extra == 'check'
Requires-Dist: isort ; extra == 'check'
Requires-Dist: twine ; extra == 'check'
Provides-Extra: docs
Requires-Dist: sphinx (>=1.8) ; extra == 'docs'
Requires-Dist: sphinx-intl ; extra == 'docs'
Requires-Dist: sphinx-py3doc-enhanced-theme ; extra == 'docs'
Requires-Dist: sphinx-rtd-theme ; extra == 'docs'
Provides-Extra: test
Requires-Dist: pytest ; extra == 'test'
Requires-Dist: pytest-pep8 ; extra == 'test'
Requires-Dist: pytest-cov ; extra == 'test'
Requires-Dist: coverage[toml] (>=5.2) ; extra == 'test'

========
Pykakasi
========


Overview
========

.. image:: https://readthedocs.org/projects/pykakasi/badge/?version=latest
   :target: https://pykakasi.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation Status

.. image:: https://badge.fury.io/py/pykakasi.png
   :target: http://badge.fury.io/py/Pykakasi
   :alt: PyPI version

.. image:: https://travis-ci.org/miurahr/pykakasi.svg?branch=master
   :target: https://travis-ci.org/miurahr/pykakasi
   :alt: Travis-CI

.. image:: https://dev.azure.com/miurahr/github/_apis/build/status/miurahr.pykakasi?branchName=master
   :target: https://dev.azure.com/miurahr/github/_build?definitionId=13&branchName=master
   :alt: Azure-Pipelines

.. image:: https://coveralls.io/repos/miurahr/pykakasi/badge.svg?branch=master
   :target: https://coveralls.io/r/miurahr/pykakasi?branch=master
   :alt: Coverage status


``pykakasi`` is a Python Natural Language Processing (NLP) library to transliterate *hiragana*, *katakana* and *kanji* (Japanese text) into *rōmaji* (Latin/Roman alphabet). It can handle characters in NFC form.

It is based on the `kakasi`_ library, which is written in C.

* Install (from `PyPI`_): ``pip install pykakasi``
* `Documentation available on readthedocs`_

.. _`PyPI`: https://pypi.org/project/pykakasi/
.. _`kakasi`: http://kakasi.namazu.org/
.. _`Documentation available on readthedocs`: https://pykakasi.readthedocs.io/en/latest/index.html


Supported python versions
=========================

* pykakasi 1.2 supports python 2.7, python 3.5, 3.6, 3.7

* pykakasi 2.0 supports python 3.6, 3.7, 3.8, pypy3.6-7.1.1

Usage
=====

Here is an usage of NewAPI for pykakasi v2.0.0 and later.
Transliterate Japanese text to kana, hiragana and romaji:

.. code-block:: python

    import pykakasi
    kks = pykakasi.kakasi()
    text = "かな漢字"
    result = kks.convert(text)
    for item in result:
        print("{}: kana '{}', hiragana '{}', romaji: '{}'".format(item['orig'], item['kana'], item['hira'], item['hepburn']))

    かな: kana 'カナ', hiragana: 'かな', romaji: 'kana'
    漢字: kana 'カンジ', hiragana: 'かんじ', romaji: 'kanji'


Here is an example that output as similar with furigana mode.

.. code-block:: python

    import pykakasi
    kks = pykakasi.kakasi()
    text = "かな漢字交じり文"
    result = kks.convert(text)
    for item in result:
        print("{}[{}] ".format(item['orig'], item['hepburn'].capitalize()), end='')
    print()

    かな[Kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]


Old API
=======

There is also an old API for v1.2.

Transliterate Japanese text to rōmaji:

.. code-block:: pycon

    >>> import pykakasi
    >>>
    >>> text = u"かな漢字交じり文"
    >>> kakasi = pykakasi.kakasi()
    >>> kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion
    >>> kakasi.setMode("K","a") # Katakana to ascii, default: no conversion
    >>> kakasi.setMode("J","a") # Japanese to ascii, default: no conversion
    >>> kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table
    >>> kakasi.setMode("s", True) # add space, default: no separator
    >>> kakasi.setMode("C", True) # capitalize, default: no capitalize
    >>> conv = kakasi.getConverter()
    >>> result = conv.do(text)
    >>> print(result)
    kana Kanji Majiri Bun

Tokenize Japanese text (split by word boundaries), equivalent to ``kakasi``'s wakati gaki option:

.. code-block:: pycon

    >>> wakati = pykakasi.wakati()
    >>> conv = wakati.getConverter()
    >>> result = conv.do(text)
    >>> print(result)
    かな 漢字 交じり 文

Add `furigana`_ (pronounciation aid) in rōmaji to text:

.. code-block:: pycon

    >>> kakasi = pykakasi.kakasi()
    >>> kakasi.setMode("J","aF") # Japanese to furigana
    >>> kakasi.setMode("H","aF") # Japanese to furigana
    >>> conv = kakasi.getConverter()
    >>> result = conv.do(text)
    >>> print(result)
    かな[kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]

Input mode values: "J" (Japanese: kanji, hiragana and katakana), "H" (hiragana), "K" (katakana).

Output mode values: "H" (hiragana), "K" (katakana), "a" (alphabet / rōmaji), "aF" (furigana in rōmaji).

There are other ``setMode`` switches which control output:

* "r": Romanisation table: `Hepburn`_ (default), `Kunrei`_ or ``Passport``
* "s": Separator: ``False`` adds no spaces between words (default), ``True`` adds spaces between words
* "C": Capitalize: ``False`` adds no capital letters (default), ``True`` makes each word start with a capital letter

.. _`furigana`: https://en.wikipedia.org/wiki/Furigana
.. _`Hepburn`: https://en.wikipedia.org/wiki/Hepburn_romanization
.. _`Kunrei`: https://en.wikipedia.org/wiki/Kunrei-shiki_romanization

Copyright and License
=====================

Copyright 2010-2020 Hiroshi Miura <miurahr@linux.com>

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.


==================
PyKakasi ChangeLog
==================

All notable changes to this project will be documented in this file.

Unreleased_
===========

Added
-----

Changed
-------

Fixed
-----

Deprecated
----------

Removed
-------

Security
--------

v2.0.4_ (26, Nov. 2020)
=======================

Fixed
-----

* CLI: Fix -v and -h option crash on python 3.7 and before (#108).


v2.0.3_ (25, Nov. 2020)
=======================

Fixed
-----

* CLI: Fix -v and -h option crash (#108).


v2.0.2_ (23, Jul. 2020)
=======================

Fixed
-----

* Fix convert() to handle Katakana correctly.(#103)


v2.0.1_ (23, Jul. 2020)
=======================

Changed
-------

* Update setup.py, setup.cfg, tox.ini(#102)


Fixed
-----

* Fix convert() misses last part of a text (#99, #100)
* Fix CI, coverage, and coveralls configurations(#101)


v2.0.0_ (31, May. 2020)
=======================

Changed
-------

* Update test formatting.

v2.0.0b1_ (9, May. 2020)
========================

Changed
-------

* Update test.


v2.0.0a6_ (30, Mar. 2020)
=========================

Added
-----

* Understand more kanji variations.

Fixed
-----

* Fix IVS handling to return correct word length to consume.


v2.0.0a5_ (23, Mar. 2020)
=========================

Changed
-------

* Recognize UNICODE standard Ideographic Variation Selector(IVS) and transiliterate when used.(#97)


v2.0.0a4_ (20, Mar. 2020)
==========================

Added
-----

* Add type hinting.

Changed
-------

* Refactoring dictionary generation classes.
* call super() from wakati.__init__()
* test: detection whether tox or raw pytest by TOX_ENV environment variable.
  When raw pytest, generate dictionaries as fixture.
  Previous versions uses --runenv option for pytest.

Fixed
-----

* NewAPI: fix return value when empty input string.


`v2.0.0a3`_ (18, Mar. 2020)
===========================

Changed
-------

* Update test cases.

Fixed
-----

* Add guard for unknown symbol code point which lead NoneType error. 


`v2.0.0a2`_ (16, Mar. 2020)
===========================

Added
-----

* NewAPI: support kunrei and passport roman conversion rule.

Changed
-------

* CI: test by github actions

Fixed
-----

* Support an extended kana(#77)
  (U0001b150-U0001b152, U0001b164-U0001b167)

`v2.0.0a1`_ (14, Mar. 2020)
===========================

Added
-----

* Structured interface of Kakasi class.(#21)

Changed
-------

* Github workflows for packaging and release.(#91)

Fixed
-----

* fix data kakasidict.utf8: “本蓮沼”

Deprecated
----------

* Drop python 2.7 support.


`v1.2`_ (26, Sep, 2019)
=======================

Fixed
-----

* Fix out-of-index error when kana-dash is placed on first of same character group.(#85)

`v1.1`_ (16, Sep, 2019)
=======================

`v1.1b2`_ (14, Sep, 2019)
=========================

Fixed
-----

* Fix Long symble issue(#58) (thanks @northernbird and @ta9ya)


`v1.1b1`_ (6, Sep, 2019)
========================

Added
-----
* Add conversions: kya, kyu, kyo

Changed
-------
* Rewording README document

`v1.1a1`_ (8, Jul, 2019)
========================

Changed
-------

* pytest: now run on project root without tox, by generating
  dictionary as a test fixture.
* tox: run tox test with installed dictionary instead of
  a generated fixture.
* Optimize kana conversion function.
* Move kakasidict.py to src and conftest.py to tests

Fixed
-----

* Version naming follows PEP386.
* Sometimes fails to insert space after punctuation(#79).
* Special case in kana-roman passport conversion such as 'etchu' etc.



.. _Unreleased: https://github.com/miurahr/pykakasi/compare/v2.0.4...HEAD
.. _v2.0.4: https://github.com/miurahr/pykakasi/compare/v2.0.3...v2.0.4
.. _v2.0.3: https://github.com/miurahr/pykakasi/compare/v2.0.2...v2.0.3
.. _v2.0.2: https://github.com/miurahr/pykakasi/compare/v2.0.1...v2.0.2
.. _v2.0.1: https://github.com/miurahr/pykakasi/compare/v2.0.0...v2.0.1
.. _v2.0.0: https://github.com/miurahr/pykakasi/compare/v2.0.0b1...v2.0.0
.. _v2.0.0b1: https://github.com/miurahr/pykakasi/compare/v2.0.0a6...v2.0.0b1
.. _v2.0.0a6: https://github.com/miurahr/pykakasi/compare/v2.0.0a5...v2.0.0a6
.. _v2.0.0a5: https://github.com/miurahr/pykakasi/compare/v2.0.0a4...v2.0.0a5
.. _v2.0.0a4: https://github.com/miurahr/pykakasi/compare/v2.0.0a3...v2.0.0a4
.. _v2.0.0a3: https://github.com/miurahr/pykakasi/compare/v2.0.0a2...v2.0.0a3
.. _v2.0.0a2: https://github.com/miurahr/pykakasi/compare/v2.0.0a1...v2.0.0a2
.. _v2.0.0a1: https://github.com/miurahr/pykakasi/compare/v1.2...v2.0.0a1
.. _v1.2: https://github.com/miurahr/pykakasi/compare/v1.1...v1.2
.. _v1.1: https://github.com/miurahr/pykakasi/compare/v1.1b2...v1.1
.. _v1.1b2: https://github.com/miurahr/pykakasi/compare/v1.1b1...v1.1b2
.. _v1.1b1: https://github.com/miurahr/pykakasi/compare/v1.1a1...v1.1b1
.. _v1.1a1: https://github.com/miurahr/pykakasi/compare/v1.0c2...v1.1a1


