Metadata-Version: 2.4
Name: Mikado
Version: 2.3.5rc3
Summary: A Python3 annotation program to select the best gene model in each locus
Home-page: https://github.com/EI-CoreBioinformatics/mikado
Author: Luca Venturini
Author-email: lucventurini@gmail.com
License: LGPL-3.0-or-later
Keywords: rna-seq annotation genomics transcriptomics
Classifier: Development Status :: 5 - Production/Stable
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
Classifier: Operating System :: POSIX :: Linux
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >3.8,<3.11
License-File: LICENSE.txt
License-File: AUTHORS
Requires-Dist: setuptools<80.9
Requires-Dist: cython==0.29.32
Requires-Dist: biopython==1.79
Requires-Dist: docutils==0.19
Requires-Dist: drmaa==0.7.9
Requires-Dist: hypothesis==6.56.2
Requires-Dist: msgpack==1.0.4
Requires-Dist: networkx==2.8.7
Requires-Dist: numpy==1.23.3
Requires-Dist: pandas==1.5.0
Requires-Dist: pysam==0.23.3
Requires-Dist: pyyaml==6.0.1
Requires-Dist: scipy==1.11.1
Requires-Dist: snakemake==6.15.5
Requires-Dist: sqlalchemy==1.4.41
Requires-Dist: sqlalchemy-utils==0.38.3
Requires-Dist: tabulate==0.9.0
Requires-Dist: pytest==7.1.3
Requires-Dist: python-rapidjson==1.9
Requires-Dist: toml==0.10.2
Requires-Dist: pyfaidx==0.5.9.5
Requires-Dist: marshmallow==3.14.1
Requires-Dist: marshmallow-dataclass==8.5.3
Requires-Dist: typeguard==2.13.3
Provides-Extra: postgresql
Requires-Dist: psycopg2; extra == "postgresql"
Provides-Extra: mysql
Requires-Dist: mysqlclient>=1.3.6; extra == "mysql"
Provides-Extra: bam
Requires-Dist: pysam>=0.8; extra == "bam"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification
of expressed loci from RNA-Seq data * and to select the best models in each locus.

The logic of the pipeline is as follows:

1. In a first step, the annotation (provided in GTF/GFF3 format) is parsed to locate *superloci* of overlapping features on the **same strand**.
2. The superloci are divided into different *subloci*, each of which is defined as follows:

    * For multiexonic transcripts, to belong to the same sublocus they must share at least a splicing junction (i.e. an intron)
    * For monoexonic transcripts, they must overlap for at least one base pair
    * All subloci must contain either only multiexonic or only monoexonic transcripts
3. In each sublocus, the pipeline selects the best transcript according to a user-defined prioritization scheme.
4. The resulting *monosubloci* are merged together, if applicable, into *monosubloci_holders*
5. The best non-overlapping transcripts are selected, in order to define the *loci* contained inside the superlocus.

    * At this stage, monoexonic and multiexonic transcript are checked for overlaps
    * Moreover, two multiexonic transcripts are considered to belong to the same locus if they share a splice *site* (not junction)
    
6. Once the loci have been defined, the program backtracks and looks for transcripts which can be assigned unambiguously to a single locus and constitute valid alternative splicing isoforms of the main transcripts. 

The criteria used to select the "*best*" transcript are left to the user's discretion, using specific configuration files.
