Metadata-Version: 2.4
Name: buscolite
Version: 26.4.22
Summary: busco analysis for gene predictions
Project-URL: Homepage, https://github.com/nextgenusfs/buscolite
Project-URL: Repository, https://github.com/nextgenusfs/buscolite.git
Author-email: Jon Palmer <nextgenusfs@gmail.com>
License: BSD 2-Clause License
        
        Copyright (c) 2022, Jonathan M. Palmer
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        * Redistributions of source code must retain the above copyright notice, this
          list of conditions and the following disclaimer.
        
        * Redistributions in binary form must reproduce the above copyright notice,
          this list of conditions and the following disclaimer in the documentation
          and/or other materials provided with the distribution.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
License-File: LICENSE.md
Keywords: BUSCO,annotation,bioinformatics,completeness,genome
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.6.0
Requires-Dist: natsort
Requires-Dist: packaging
Requires-Dist: pyfastx>=2.0.0
Requires-Dist: pyhmmer>=0.12.0
Provides-Extra: dev
Requires-Dist: black>=24.3.0; extra == 'dev'
Requires-Dist: flake8>=7.0.0; extra == 'dev'
Requires-Dist: isort>=5.13.2; extra == 'dev'
Requires-Dist: pre-commit>=3.5.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Description-Content-Type: text/markdown

[![Latest Github release](https://img.shields.io/github/release/nextgenusfs/buscolite.svg)](https://github.com/nextgenusfs/buscolite/releases/latest)
![Conda](https://img.shields.io/conda/dn/bioconda/buscolite)
[![Tests](https://github.com/nextgenusfs/buscolite/actions/workflows/tests.yml/badge.svg)](https://github.com/nextgenusfs/buscolite/actions/workflows/tests.yml)
[![codecov](https://codecov.io/gh/nextgenusfs/buscolite/branch/master/graph/badge.svg)](https://codecov.io/gh/nextgenusfs/buscolite)

# BUSCOlite: simplified BUSCO analysis for gene prediction

BUSCOlite can run the miniprot/Augustus mediated genome predictions as well as the [pyhmmer](https://pyhmmer.readthedocs.io/en/stable/index.html) HMM predictions using the BUSCO v9, v10, or v12 databases. It also provides a python API to run busco analysis from within python, ie to be used inside the eukaryotic gene prediction pipeline Funannotate.

This tool is not meant to be a replacment of BUSCO, for most general use cases you should continue to use [BUSCO](https://busco.ezlab.org)

BUSCO models/lineages can be downloaded from the BUSCO site: [v5](https://busco-data.ezlab.org/v5/data/lineages/), [v4](https://busco-data.ezlab.org/v4/data/lineages/).  BUSCOlite does not provide an internal method to do this, as it is trivial to download the lineage you need from your organism(s) by following these links.

##### There are limited dependencies with BUSCOlite:
* [augustus](https://github.com/Gaius-Augustus/Augustus) (note: many versions on conda have non-functional PPX/--proteinprofile mode)
* [miniprot](https://github.com/lh3/miniprot)
* [pyhmmer](https://pyhmmer.readthedocs.io/en/stable/index.html)
* [pyfastx](https://github.com/lmdu/pyfastx)
* [natsort](https://pypi.org/project/natsort/)

##### Features:
* **Genome and protein mode analysis**: Run BUSCO on genome assemblies or protein sets
* **BUSCO v6-compatible filtering**: Implements the same filtering logic as BUSCO v6 for accurate results
* **Publication-quality plots**: Generate SVG plots from results with zero additional dependencies
* **Multi-sample comparison**: Compare multiple BUSCO results in a single plot
* **Python API**: Use BUSCOlite programmatically in your own scripts
* **Lightweight**: Minimal dependencies, easy to install and integrate

#### Why?

[Funannotate](https://github.com/nextgenusfs/funannotate) uses BUSCO to find core conserved marker genes that it uses as a basis to train several ab-initio gene predictors. When BUSCO v2 came out it was python3 only and at that time funannotate was still python2, so I modified the BUSCOv2 source code to be compatible with python2 so it could be run within funannotate. Now BUSCOv5 is the current release, that has numerous bells and whistles that funannotate does not need (no knock against bells and whistles) but the real problem is that due to the large number of dependencies associated with these extra tools is that I cannot build a conda image that includes funannotate and BUSCOv5. So I re-wrote BUSCOv2 here so that it has limited dependencies and will make it easier to incorporate as a dependency of funannotate.  A side note is that the `metaeuk` method that BUSCOv5 now uses as default does not produce complete gene models, in fact the protein sequences it outputs have lowercase sequences that are actually not found in your genome at all.  So for training ab-initio predictors, the `metaeuk` method is not useful -- however, it is faster to get your simple stats on "how complete is my genome assembly".


To install release versions use the pip package manager, like so:
```
python -m pip install buscolite
```

To install the most updated code in master you can run:
```
python -m pip install git+https://github.com/nextgenusfs/buscolite.git
```

## Quick Start

Run BUSCO analysis on a genome:
```bash
buscolite -i genome.fasta -o mygenome -m genome -l /path/to/fungi_odb12 -c 8
```

Generate a plot from the results:
```bash
buscolite-plot mygenome.buscolite.json -o mygenome_plot.svg
```

Compare multiple samples:
```bash
buscolite-plot sample1.buscolite.json sample2.buscolite.json sample3.buscolite.json -o comparison.svg
```

For detailed usage instructions, see the [Usage Guide](docs/USAGE.md).

## Development

If you want to contribute to the development of BUSCOlite, follow these steps:

1. Clone the repository:
   ```
   git clone https://github.com/nextgenusfs/buscolite.git
   cd buscolite
   ```

2. Set up the development environment:
   ```
   ./scripts/setup_dev.sh
   ```
   This will install the development dependencies and set up pre-commit hooks.

3. Make your changes and commit them. The pre-commit hooks will automatically check and format your code.

4. Run the tests to make sure everything is working:
   ```
   pytest
   ```
