Metadata-Version: 2.4
Name: peptdeep
Version: 1.4.1
Summary: The AlphaX deep learning framework for Proteomics
Author-email: Mann Labs <jalew.zwf@qq.com>
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
           1. Definitions.
        
              "License" shall mean the terms and conditions for use, reproduction,
              and distribution as defined by Sections 1 through 9 of this document.
        
              "Licensor" shall mean the copyright owner or entity authorized by
              the copyright owner that is granting the License.
        
              "Legal Entity" shall mean the union of the acting entity and all
              other entities that control, are controlled by, or are under common
              control with that entity. For the purposes of this definition,
              "control" means (i) the power, direct or indirect, to cause the
              direction or management of such entity, whether by contract or
              otherwise, or (ii) ownership of fifty percent (50%) or more of the
              outstanding shares, or (iii) beneficial ownership of such entity.
        
              "You" (or "Your") shall mean an individual or Legal Entity
              exercising permissions granted by this License.
        
              "Source" form shall mean the preferred form for making modifications,
              including but not limited to software source code, documentation
              source, and configuration files.
        
              "Object" form shall mean any form resulting from mechanical
              transformation or translation of a Source form, including but
              not limited to compiled object code, generated documentation,
              and conversions to other media types.
        
              "Work" shall mean the work of authorship, whether in Source or
              Object form, made available under the License, as indicated by a
              copyright notice that is included in or attached to the work
              (an example is provided in the Appendix below).
        
              "Derivative Works" shall mean any work, whether in Source or Object
              form, that is based on (or derived from) the Work and for which the
              editorial revisions, annotations, elaborations, or other modifications
              represent, as a whole, an original work of authorship. For the purposes
              of this License, Derivative Works shall not include works that remain
              separable from, or merely link (or bind by name) to the interfaces of,
              the Work and Derivative Works thereof.
        
              "Contribution" shall mean any work of authorship, including
              the original version of the Work and any modifications or additions
              to that Work or Derivative Works thereof, that is intentionally
              submitted to Licensor for inclusion in the Work by the copyright owner
              or by an individual or Legal Entity authorized to submit on behalf of
              the copyright owner. For the purposes of this definition, "submitted"
              means any form of electronic, verbal, or written communication sent
              to the Licensor or its representatives, including but not limited to
              communication on electronic mailing lists, source code control systems,
              and issue tracking systems that are managed by, or on behalf of, the
              Licensor for the purpose of discussing and improving the Work, but
              excluding communication that is conspicuously marked or otherwise
              designated in writing by the copyright owner as "Not a Contribution."
        
              "Contributor" shall mean Licensor and any individual or Legal Entity
              on behalf of whom a Contribution has been received by Licensor and
              subsequently incorporated within the Work.
        
           2. Grant of Copyright License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              copyright license to reproduce, prepare Derivative Works of,
              publicly display, publicly perform, sublicense, and distribute the
              Work and such Derivative Works in Source or Object form.
        
           3. Grant of Patent License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              (except as stated in this section) patent license to make, have made,
              use, offer to sell, sell, import, and otherwise transfer the Work,
              where such license applies only to those patent claims licensable
              by such Contributor that are necessarily infringed by their
              Contribution(s) alone or by combination of their Contribution(s)
              with the Work to which such Contribution(s) was submitted. If You
              institute patent litigation against any entity (including a
              cross-claim or counterclaim in a lawsuit) alleging that the Work
              or a Contribution incorporated within the Work constitutes direct
              or contributory patent infringement, then any patent licenses
              granted to You under this License for that Work shall terminate
              as of the date such litigation is filed.
        
           4. Redistribution. You may reproduce and distribute copies of the
              Work or Derivative Works thereof in any medium, with or without
              modifications, and in Source or Object form, provided that You
              meet the following conditions:
        
              (a) You must give any other recipients of the Work or
                  Derivative Works a copy of this License; and
        
              (b) You must cause any modified files to carry prominent notices
                  stating that You changed the files; and
        
              (c) You must retain, in the Source form of any Derivative Works
                  that You distribute, all copyright, patent, trademark, and
                  attribution notices from the Source form of the Work,
                  excluding those notices that do not pertain to any part of
                  the Derivative Works; and
        
              (d) If the Work includes a "NOTICE" text file as part of its
                  distribution, then any Derivative Works that You distribute must
                  include a readable copy of the attribution notices contained
                  within such NOTICE file, excluding those notices that do not
                  pertain to any part of the Derivative Works, in at least one
                  of the following places: within a NOTICE text file distributed
                  as part of the Derivative Works; within the Source form or
                  documentation, if provided along with the Derivative Works; or,
                  within a display generated by the Derivative Works, if and
                  wherever such third-party notices normally appear. The contents
                  of the NOTICE file are for informational purposes only and
                  do not modify the License. You may add Your own attribution
                  notices within Derivative Works that You distribute, alongside
                  or as an addendum to the NOTICE text from the Work, provided
                  that such additional attribution notices cannot be construed
                  as modifying the License.
        
              You may add Your own copyright statement to Your modifications and
              may provide additional or different license terms and conditions
              for use, reproduction, or distribution of Your modifications, or
              for any such Derivative Works as a whole, provided Your use,
              reproduction, and distribution of the Work otherwise complies with
              the conditions stated in this License.
        
           5. Submission of Contributions. Unless You explicitly state otherwise,
              any Contribution intentionally submitted for inclusion in the Work
              by You to the Licensor shall be under the terms and conditions of
              this License, without any additional terms or conditions.
              Notwithstanding the above, nothing herein shall supersede or modify
              the terms of any separate license agreement you may have executed
              with Licensor regarding such Contributions.
        
           6. Trademarks. This License does not grant permission to use the trade
              names, trademarks, service marks, or product names of the Licensor,
              except as required for reasonable and customary use in describing the
              origin of the Work and reproducing the content of the NOTICE file.
        
           7. Disclaimer of Warranty. Unless required by applicable law or
              agreed to in writing, Licensor provides the Work (and each
              Contributor provides its Contributions) on an "AS IS" BASIS,
              WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
              implied, including, without limitation, any warranties or conditions
              of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
              PARTICULAR PURPOSE. You are solely responsible for determining the
              appropriateness of using or redistributing the Work and assume any
              risks associated with Your exercise of permissions under this License.
        
           8. Limitation of Liability. In no event and under no legal theory,
              whether in tort (including negligence), contract, or otherwise,
              unless required by applicable law (such as deliberate and grossly
              negligent acts) or agreed to in writing, shall any Contributor be
              liable to You for damages, including any direct, indirect, special,
              incidental, or consequential damages of any character arising as a
              result of this License or out of the use or inability to use the
              Work (including but not limited to damages for loss of goodwill,
              work stoppage, computer failure or malfunction, or any and all
              other commercial damages or losses), even if such Contributor
              has been advised of the possibility of such damages.
        
           9. Accepting Warranty or Additional Liability. While redistributing
              the Work or Derivative Works thereof, You may choose to offer,
              and charge a fee for, acceptance of support, warranty, indemnity,
              or other liability obligations and/or rights consistent with this
              License. However, in accepting such obligations, You may act only
              on Your own behalf and on Your sole responsibility, not on behalf
              of any other Contributor, and only if You agree to indemnify,
              defend, and hold each Contributor harmless for any liability
              incurred by, or claims asserted against, such Contributor by reason
              of your accepting any such warranty or additional liability.
        
           END OF TERMS AND CONDITIONS
        
           APPENDIX: How to apply the Apache License to your work.
        
              To apply the Apache License to your work, attach the following
              boilerplate notice, with the fields enclosed by brackets "[]"
              replaced with your own identifying information. (Don't include
              the brackets!)  The text should be enclosed in the appropriate
              comment syntax for the file format. We also recommend that a
              file or class name and description of purpose be included on the
              same "printed page" as the copyright notice for easier
              identification within third-party archives.
        
           Copyright 2020 MannLabs
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
        
Project-URL: repository, https://github.com/MannLabs/peptdeep
Project-URL: issues, https://github.com/MannLabs/peptdeep/issues
Project-URL: download, https://mannlabs.github.io/peptdeep/releases
Project-URL: homepage, https://www.alphapept.org
Project-URL: Mann Labs Homepage, https://www.biochem.mpg.de/mann
Project-URL: publication, https://doi.org/10.1038/s41467-022-34904-3
Project-URL: documentation, https://alphapeptdeep.readthedocs.io/en/latest/
Keywords: mass spectrometry,proteomics,bioinformatics,AlphaPept,AlphaPept ecosystem,deep learning,alphapept.org
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.8.0
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: pywin32; sys_platform == "win32"
Requires-Dist: pythonnet; sys_platform == "win32"
Requires-Dist: click
Requires-Dist: pandas
Requires-Dist: numpy<2
Requires-Dist: torch
Requires-Dist: tqdm
Requires-Dist: numba
Requires-Dist: psutil
Requires-Dist: transformers
Requires-Dist: scikit-learn
Requires-Dist: lxml
Requires-Dist: pyteomics
Requires-Dist: alphabase>=1.5.0
Requires-Dist: alpharaw>=0.2.0
Provides-Extra: stable
Requires-Dist: pywin32==308; sys_platform == "win32" and extra == "stable"
Requires-Dist: pythonnet==3.0.4; sys_platform == "win32" and extra == "stable"
Requires-Dist: click==8.1.7; extra == "stable"
Requires-Dist: pandas==2.2.3; extra == "stable"
Requires-Dist: numpy<2; extra == "stable"
Requires-Dist: torch==2.5.1; (sys_platform != "darwin" or platform_machine != "x86_64") and extra == "stable"
Requires-Dist: torch==2.2.2; (sys_platform == "darwin" and platform_machine == "x86_64") and extra == "stable"
Requires-Dist: tqdm==4.67.1; extra == "stable"
Requires-Dist: numba==0.60.0; extra == "stable"
Requires-Dist: psutil==6.1.0; extra == "stable"
Requires-Dist: transformers==4.47.0; extra == "stable"
Requires-Dist: scikit-learn==1.6.0; extra == "stable"
Requires-Dist: lxml==5.3.0; extra == "stable"
Requires-Dist: pyteomics==4.7.5; extra == "stable"
Requires-Dist: alphabase>=1.5.0; extra == "stable"
Requires-Dist: alpharaw>=0.2.0; extra == "stable"
Provides-Extra: gui
Requires-Dist: streamlit>=1.23.0; extra == "gui"
Requires-Dist: streamlit-aggrid; extra == "gui"
Provides-Extra: gui-stable
Requires-Dist: streamlit==1.40.2; extra == "gui-stable"
Requires-Dist: streamlit-aggrid==1.0.5; extra == "gui-stable"
Provides-Extra: hla
Requires-Dist: pydivsufsort; extra == "hla"
Provides-Extra: hla-stable
Requires-Dist: pydivsufsort; extra == "hla-stable"
Provides-Extra: docs
Requires-Dist: autodocsumm; extra == "docs"
Requires-Dist: myst_parser; extra == "docs"
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: nbsphinx; extra == "docs"
Requires-Dist: jinja2; extra == "docs"
Requires-Dist: contextfilter; extra == "docs"
Requires-Dist: furo; extra == "docs"
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pre-commit==3.7.0; extra == "tests"
Requires-Dist: nbmake==1.5.3; extra == "tests"
Requires-Dist: seaborn; extra == "tests"
Provides-Extra: development
Requires-Dist: jupyter; extra == "development"
Requires-Dist: ipykernel; extra == "development"
Requires-Dist: autodocsumm; extra == "development"
Requires-Dist: myst_parser; extra == "development"
Requires-Dist: sphinx; extra == "development"
Requires-Dist: nbsphinx; extra == "development"
Requires-Dist: jinja2; extra == "development"
Requires-Dist: contextfilter; extra == "development"
Requires-Dist: furo; extra == "development"
Requires-Dist: pytest; extra == "development"
Requires-Dist: pre-commit==3.7.0; extra == "development"
Requires-Dist: nbmake==1.5.3; extra == "development"
Requires-Dist: seaborn; extra == "development"
Dynamic: license-file


[![Default installation and tests](https://github.com/MannLabs/alphapeptdeep/actions/workflows/pip_installation.yml/badge.svg)](https://github.com/MannLabs/alphapeptdeep/actions/workflows/pip_installation.yml)
[![Publish on PyPi and release on GitHub](https://github.com/MannLabs/alphapeptdeep/actions/workflows/publish_on_pypi.yml/badge.svg)](https://github.com/MannLabs/alphapeptdeep/actions/workflows/publish_and_release.yml)
[![Documentation Status](https://readthedocs.org/projects/alphapeptdeep/badge/?version=latest)](https://alphapeptdeep.readthedocs.io/en/latest/?badge=latest)
[![pypi](https://img.shields.io/pypi/v/peptdeep)](https://pypi.org/project/peptdeep)
[![GitHub release](https://img.shields.io/github/v/release/mannlabs/alphapeptdeep?display_name=tag)](https://github.com/MannLabs/alphapeptdeep/releases)
[![GitHub downloads](https://img.shields.io/github/downloads/mannlabs/alphapeptdeep/total?label=github%20downloads)](https://github.com/MannLabs/alphapeptdeep/releases)
[![Downloads@pre-train-models](https://img.shields.io/github/downloads/mannlabs/alphapeptdeep/pre-trained-models/total)](https://github.com/MannLabs/alphapeptdeep/releases/tag/pre-trained-models)
[![pip downloads](https://img.shields.io/pypi/dm/peptdeep?color=blue&label=pip%20downloads)](https://pypi.org/project/peptdeep)
![Python](https://img.shields.io/pypi/pyversions/peptdeep)

# AlphaPeptDeep (PeptDeep)

------------------------------------------------------------------------

<!-- PROJECT LOGO -->
<br />
<div align="center">
  <img src="peptdeep/webui/logos/peptdeep.png" alt="Logo" width="80" height="80">

  <h3 align="center">PeptDeep</h3>

  <p align="center">
    <a href="https://doi.org/10.1038/s41467-022-34904-3">Publication</a>
    ·
    <a href="https://github.com/Mannlabs/peptdeep/releases/latest">Download</a>
    ·
    <a href="#installation">Installation</a>
    ·
    <a href="#usage">Usage</a>
    ·
    <a href="https://alphapeptdeep.readthedocs.io/en/latest/">Documentation</a>
    ·
    <a href="https://alphapept.org">alphapept.org</a>

  </p>
</div>

![screenshot](misc/screenshot.png)

## About

AlphaPeptDeep (`peptdeep` for short) aims to easily build new deep
learning models for shotgun proteomics studies. Transfer learning is
also easy to apply using AlphaPeptDeep.

It contains some built-in models such as retention time (RT), collision
cross section (CCS), and tandem mass spectrum (MS2) prediction for given
peptides. With these models, one can easily generate a predicted library
from fasta files.

For details, check out our [publication](#citation), and the [documentation](https://alphapeptdeep.readthedocs.io/en/latest/).

Visit [alphapept.org](https://alphapept.org) for other packages of AlphaPept ecosystem.

### Subsequent projects of AlphaPeptDeep

- [**peptdeep_hla**](https://github.com/MannLabs/PeptDeep-HLA): the DL model that predict if a peptide is presented by indivudual HLA or not.

### Other pre-trained MS2/RT/CCS models

- [**Dimethyl**](https://github.com/MannLabs/alphapeptdeep/releases/tag/dimethyl-models): the MS2/RT/CCS models for Dimethyl-labeled peptides.

------------------------------------------------------------------------

## Installation

AlphaPeptDeep can be installed and used on all major operating systems
(Windows, macOS and Linux).

There are different types of installation possible:

- [**One-click GUI installation:**](#one-click-gui-installation) Choose this
  installation if you only want the GUI and/or keep things as simple as
  possible.
- [**Pip installation:**](#pip-installation) Choose this installation if you want to use peptdeep as a Python package in an existing Python (recommended Python 3.8 or 3.9) environment (e.g. a Jupyter notebook). If needed, the GUI and CLI
  can be installed with pip as well.
- [**Developer installation:**](#developer-installation) Choose this installation if you
  are familiar with CLI tools, [conda](https://docs.conda.io/en/latest/)
  and Python. This installation allows access to all available features
  of peptdeep and even allows to modify its source code directly.
  Generally, the developer version of peptdeep outperforms the
  precompiled versions which makes this the installation of choice for
  high-throughput experiments.
- [**Docker installation**](#docker-installation) Choose this installation if you want to use peptdeep without any changes to your system.

### One-click GUI installation

The GUI of peptdeep is a completely stand-alone tool that requires no
knowledge of Python or CLI tools.

You can download the latest release of peptdeep [here](https://github.com/Mannlabs/peptdeep/releases/latest).

Note that, as GitHub does not support large release files, these installers do not have GPU support.
To install a version with GPU support [see here](#enable-gpu-support).

#### Windows
Download the latest `peptdeep-X.Y.Z-windows-amd64.exe ` build and double click it to install. If you receive a warning during installation click *Run anyway*.
Important note: always install peptdeep into a new folder, as the installer will not properly overwrite existing installations.

#### Linux
Download the latest `peptdeep-X.Y.Z-linux-x64.deb` build and install it via `dpkg -i peptdeep-X.Y.Z-linux-x64.deb`.

#### MacOS
Download the latest build suitable for your chip architecture
(can be looked up by clicking on the Apple Symbol > *About this Mac* > *Chip* ("M1", "M2", "M3" -> `arm64`, "Intel" -> `x64`),
`peptdeep-X.Y.Z-macos-darwin-arm64.pkg ` or ` peptdeep-X.Y.Z-macos-darwin-x64.pkg`. Open the parent folder of the downloaded file in Finder,
right-click and select *open*. If you receive a warning during installation click *Open*.

In newer MacOS versions, additional steps are required to enable installation of unverified software.
This is indicated by a dialog telling you `“peptdeep. ... .pkg” Not Opened`.
1. Close this dialog by clicking `Done`.
2. Choose `Apple menu` > `System Settings`, then `Privacy & Security` in the sidebar. (You may need to scroll down.)
3. Go to `Security`, locate the line "peptdeep.pkg was blocked to protect your Mac" then click `Open Anyway`.
4. In the dialog windows, click `Open Anyway`.


Older releases remain available on the [release
page](https://github.com/MannLabs/alphapeptdeep/releases), but no
backwards compatibility is guaranteed.



### Pip installation

peptdeep can be installed in an existing Python environment with a
single `bash` command. *This `bash` command can also be run directly
from within a Jupyter notebook by prepending it with a `!`*:

```bash
pip install peptdeep
```

Installing peptdeep like this avoids conflicts when integrating it in
other tools, as this does not enforce strict versioning of dependencies.
However, if new versions of dependencies are released, they are not
guaranteed to be fully compatible with peptdeep. This should only occur
in rare cases where dependencies are not backwards compatible.

You can always force peptdeep to use dependency versions
which are known to be compatible with:

``` bash
pip install "peptdeep[stable]"
```

It is also possible to directly install any branch (e.g. `some-branch`) from GitHub with
``` bash
pip install "git+https://github.com/MannLabs/alphapeptdeep.git@some-branch#egg=peptdeep[stable,development-stable]"
```

The GUI version can be installed with
``` bash
pip install "peptdeep[gui]"
```
or
``` bash
pip install "peptdeep[stable,gui-stable]"
```

Note: PythonNET must be installed to access Thermo or Sciex raw data.
This is provided through AlphaRaw, which depends on Mono (for Mac/Linux).
A detailed guide to installing AlphaRaw with mono can be found [here](https://github.com/MannLabs/alpharaw#installation).


### Developer installation

peptdeep can also be installed in "editable" mode. This allows to fully customize the software and
even modify the source code to your specific needs.

First, clone the peptdeep repository from GitHub to a new directory
``` bash
mkdir -p ~/alphapeptdeep/project/folder && cd ~/alphapeptdeep/project/folder
git clone https://github.com/MannLabs/alphapeptdeep.git && cd alphapeptdeep
```

Next, it is highly recommended to use a separate
[conda virtual environment](https://docs.conda.io/en/latest/), as
otherwise dependency conflicts can occur with already existing
packages
``` bash
conda create --name peptdeep python=3.9 -y
conda activate peptdeep
```

Finally, peptdeep and all its [dependencies](requirements) need to be
installed. To take advantage of all features and allow development (with
the `-e` flag), this is best done by also installing the [development
dependencies](requirements/requirements_development_loose.txt) instead of only
the [core dependencies](requirements/requirements_loose.txt):

``` bash
pip install -e ".[development]"
```

By default this installs 'loose' dependencies (no pinned versions),
although it is also possible to use stable dependencies
(e.g. `pip install -e ".[stable,development-stable]"`).

By using the editable flag `-e`, all modifications to the [peptdeep
source code folder](peptdeep) are directly reflected when running
peptdeep. Note that the peptdeep folder cannot be moved and/or renamed
if an editable version is installed. In case of confusion, you can
always retrieve the location of any Python module with e.g. the command
`import module` followed by `module.__file__`.

### Docker installation
The containerized version can be used to run peptdeep without any installation to your system.

#### 1. Setting up Docker
Install the latest version of docker (https://docs.docker.com/engine/install/).

#### 2. Prepare folder structure
Set up your data to match the expected folder structure:
create a folder and store its name in a variable, and specify a port
```
DATA_FOLDER=/home/username/data; mkdir -p $DATA_FOLDER
PORT=8501
```

#### 3. Start the container
```bash
docker run -v $DATA_FOLDER:/app/data -p $PORT:8501 mannlabs/peptdeep:latest
```
After initial download of the container, peptdeep will start running immediately,
and can be accessed under [localhost:$PORT](http://localhost:8501).

Note: in the app, the local `$DATA_FOLDER` needs to be referred to as "`/app/data`".

#### Alternatively: Build the image yourself
If you want to build the image yourself, you can do so by
```bash
docker build -t peptdeep .
```
and run it with
```bash
docker run -p $PORT:8501 -v $DATA_FOLDER:/app/data -t peptdeep
```

### Enable GPU support

To enable GPU, it is use the either the [pip installation](#pip-installation)
or the [developer installation](#developer-installation) option,
and install the GPU version of PyTorch:
``` bash
pip install torch --extra-index-url https://download.pytorch.org/whl/cu116 --upgrade
```

Note that this may depend on your NVIDIA driver version, which can be checked with:
``` bash
nvidia-smi
```

For latest pytorch version, see [pytorch.org](https://pytorch.org/get-started/locally/).

------------------------------------------------------------------------

## Usage

There are three ways to use peptdeep:

- [**GUI**](#gui)
- [**CLI**](#cli)
- [**Python**](#python-and-jupyter-notebooks)

NOTE: The first time you use a fresh installation of peptdeep, it is
often quite slow because some functions might still need compilation on
your local operating system and architecture. Subsequent use should be a
lot faster.

### GUI

If the GUI was not installed through a one-click GUI installer, it can
be launched with the following `bash` command:

``` bash
peptdeep gui
```

This command will start a web server and automatically open the default
browser:
![](https://user-images.githubusercontent.com/4646029/189301730-ac1f92cc-0e9d-4ba3-be1d-07c4d66032cd.jpg)

There are several options in the GUI (left panel):

- Server: Start/stop the task server, check tasks in the task queue
- Settings: Configure common settings, load/save current settings
- Model: Configure DL models for prediction or transfer learning
- Transfer: Refine the models
- Library: Predict a library
- Rescore: Perform ML feature extraction and Percolator

------------------------------------------------------------------------

### CLI

The CLI can be run with the following command (after activating the
`conda` environment with `conda activate peptdeep` or if an alias was
set to the peptdeep executable):

``` bash
peptdeep -h
```

It is possible to get help about each function and their (required)
parameters by using the `-h` flag. AlphaPeptDeep provides several
commands for different tasks:

- [**export-settings**](#export-settings)
- [**cmd-flow**](#cmd-flow)
- [**library**](#library)
- [**transfer**](#transfer)
- [**rescore**](#rescore)
- [**install-models**](#install-models)
- [**gui**](#gui)

Run a command to check usages:

``` bash
peptdeep $command -h
```

For example:

``` bash
peptdeep library -h
```

#### export-settings

``` bash
peptdeep export-settings /path/to/settings.yaml
```

This command will export the default settings into the `settings.yaml`
as a template, users can edit the yaml file to run other commands.

Here is a section of the yaml file which controls global parameters for
different tasks:

```
model_url: "https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip"

task_type: library
task_type_choices:
  - library
  - train
  - rescore
thread_num: 8
torch_device:
  device_type: gpu
  device_type_choices:
    - gpu
    - mps
    - cpu
  device_ids: []

log_level: info
log_level_choices:
  - debug
  - info
  - warning
  - error
  - critical

common:
  modloss_importance_level: 1.0
  user_defined_modifications: {}
  # For example,
  # user_defined_modifications:
  #   "Dimethyl2@Any_N-term":
  #     composition: "H(2)2H(2)C(2)"
  #     modloss_composition: "H(0)" # can be without if no modloss
  #   "Dimethyl2@K":
  #     composition: "H(2)2H(2)C(2)"
  #   "Dimethyl6@Any_N-term":
  #     composition: "2H(4)13C(2)"
  #   "Dimethyl6@K":
  #     composition: "2H(4)13C(2)"

peak_matching:
  ms2_ppm: True
  ms2_tol_value: 20.0
  ms1_ppm: True
  ms1_tol_value: 20.0

model_mgr:
  default_nce: 30.0
  default_instrument: Lumos
  mask_modloss: True
  model_type: generic
  model_choices:
  - generic
  - phos
  - hla # same as generic
  - digly
  external_ms2_model: ''
  external_rt_model: ''
  external_ccs_model: ''
  instrument_group:
    ThermoTOF: ThermoTOF
    Astral: ThermoTOF
    Lumos: Lumos
    QE: QE
    timsTOF: timsTOF
    SciexTOF: SciexTOF
    Fusion: Lumos
    Eclipse: Lumos
    Velos: Lumos # not important
    Elite: Lumos # not important
    OrbitrapTribrid: Lumos
    ThermoTribrid: Lumos
    QE+: QE
    QEHF: QE
    QEHFX: QE
    Exploris: QE
    Exploris480: QE
  predict:
    batch_size_ms2: 512
    batch_size_rt_ccs: 1024
    verbose: True
    multiprocessing: True
```

The `model_mgr` section in the yaml defines the common settings for
MS2/RT/CCS prediction.

------------------------------------------------------------------------

### cmd-flow

``` bash
peptdeep cmd-flow ...
```

Support CLI parameters to control `global_settings` for CLI users. It supports three workflows: `train`, `library` or `train library`, controlled by CLI parameter `--task_workflow`, for example, `--task_workflow train library`. All settings in [global_settings](peptdeep/constants/default_settings.yaml) are converted to CLI parameters using `--` as the dict level indicator, for example, `global_settings["library"]["var_mods"]` corresponds to `--library--var_mods`. See [test_cmd_flow.sh](tests/test_cmd_flow.sh) for example.

There are three kinds of parameter types:
  1. value type (int, float, bool, str): The CLI parameter only has a single value, for instance: `--model_mgr--default_instrument 30.0`.
  2. list type (list): The CLI parameter has a list of values seperated by a space, for instance `--library--var_mods "Oxidation@M" "Acetyl@Protein_N-term"`.
  3. dict type (dict): Only three parameters are `dict type`, `--library--labeling_channels`, `--model_mgr--transfer--psm_modification_mapping`, and `--common--user_defined_modifications`. Here are the examples:
    - `--library--labeling_channels`: labeling channels for the library. Example: `--library--labeling_channels "0:Dimethyl@Any_N-term;Dimethyl@K" "4:xx@Any_N-term;xx@K"`
    - `--model_mgr--transfer--psm_modification_mapping`: converting other search engines' modification names to alphabase modifications for transfer learning. Example: `--model_mgr--transfer--psm_modification_mapping "Dimethyl@Any_N-term:_(Dimethyl-n-0);_(Dimethyl)" "Dimethyl@K:K(Dimethyl-K-0);K(Dimethyl)"`. Note that `X(UniMod:id)` format can directly be recognized by alphabase.
    - `--common--user_defined_modification`: user defined modifications. Example:`--common--user_defined_modification "NewMod1@Any_N-term:H(2)2H(2)C(2)" "NewMod2@K:H(100)O(2)C(2)"`

#### library

``` bash
peptdeep library settings_yaml
```

This command will predict a spectral library for given settings_yaml
file (exported by [export-settings](#export-settings)). All the
essential settings are in the `library` section in the settings_yaml
file:

```
library:
  infile_type: fasta
  infile_type_choices:
  - fasta
  - sequence_table
  - peptide_table # sequence with mods and mod_sites
  - precursor_table # peptide with charge state
  infiles:
  - xxx.fasta
  fasta:
    protease: 'trypsin'
    protease_choices:
    - 'trypsin'
    - '([KR])'
    - 'trypsin_not_P'
    - '([KR](?=[^P]))'
    - 'lys-c'
    - 'K'
    - 'lys-n'
    - '\w(?=K)'
    - 'chymotrypsin'
    - 'asp-n'
    - 'glu-c'
    max_miss_cleave: 2
    add_contaminants: False
  fix_mods:
  - Carbamidomethyl@C
  var_mods:
  - Acetyl@Protein_N-term
  - Oxidation@M
  special_mods: [] # normally for Phospho or GlyGly@K
  special_mods_cannot_modify_pep_n_term: False
  special_mods_cannot_modify_pep_c_term: False
  labeling_channels: {}
  # For example,
  # labeling_channels:
  #   0: ['Dimethyl@Any_N-term','Dimethyl@K']
  #   4: ['Dimethyl:2H(2)@Any_N-term','Dimethyl:2H(2)@K']
  #   8: [...]
  min_var_mod_num: 0
  max_var_mod_num: 2
  min_special_mod_num: 0
  max_special_mod_num: 1
  min_precursor_charge: 2
  max_precursor_charge: 4
  min_peptide_len: 7
  max_peptide_len: 35
  min_precursor_mz: 200.0
  max_precursor_mz: 2000.0
  decoy: pseudo_reverse
  decoy_choices:
  - pseudo_reverse
  - diann
  - None
  max_frag_charge: 2
  frag_types:
  - b
  - y
  rt_to_irt: True
  generate_precursor_isotope: False
  output_folder: "{PEPTDEEP_HOME}/spec_libs"
  output_tsv:
    enabled: False
    min_fragment_mz: 200
    max_fragment_mz: 2000
    min_relative_intensity: 0.001
    keep_higest_k_peaks: 12
    translate_batch_size: 1000000
    translate_mod_to_unimod_id: False
```

peptdeep will load sequence data based on `library:infile_type`
and `library:infiles` for library prediction.
`library:infiles` contains the list of files with
`library:infile_type` defined in
`library:infile_type_choices`:

- fasta: Protein fasta files, peptdeep will digest the protein sequences
  into peptide sequences.
- [sequence_table](#sequence_table): Tab/comma-delimited txt/tsv/csv
  (text) files which contain the column `sequence` for peptide
  sequences.
- [peptide_table](#peptide_table): Tab/comma-delimited txt/tsv/csv
  (text) files which contain the columns `sequence`, `mods`, and
  `mod_sites`. peptdeep will not add modifications for peptides of this
  file type.
- [precursor_table](#precursor_table): Tab/comma-delimited txt/tsv/csv
  (text) files which contain the columns `sequence`, `mods`,
  `mod_sites`, and `charge`. peptdeep will not add modifications and
  charge states for peptides of this file type.

See examples:

``` python
import pandas as pd
df = pd.DataFrame({
    'sequence': ['ACDEFGHIK','LMNPQRSTVK','WYVSTR'],
    'mods': ['Carbamidomethyl@C','Acetyl@Protein_N-term;Phospho@S',''],
    'mod_sites': ['2','0;7',''],
    'charge': [2,3,1],
})
```

##### sequence_table

``` python
df[['sequence']]
```

|  | sequence |
| --- | --- |
| 0 | ACDEFGHIK |
| 1 | LMNPQRSTVK |
| 2 | WYVSTR |


##### peptide_table

``` python
df[['sequence','mods','mod_sites']]
```

|  | sequence | mods | mod_sites |
| --- | --- | --- | --- |
| 0 | ACDEFGHIK | Carbamidomethyl@C | 2 |
| 1 | LMNPQRSTVK | Acetyl@Protein_N-term;Phospho@S | 0;7 |
| 2 | WYVSTR | | |

##### precursor_table

``` python
df
```

|  | sequence | mods | mod_sites | charge |
| --- | --- | --- | --- | --- |
| 0 | ACDEFGHIK | Carbamidomethyl@C | 2 | 2 |
| 1 | LMNPQRSTVK | Acetyl@Protein_N-term;Phospho@S | 0;7 | 3 |
| 2 | WYVSTR | | | 1 |

> Columns of `proteins` and `genes` are optional for these txt/tsv/csv
> files.

peptdeep supports multiple files for library prediction, for example (in
the yaml file):

```
library:
  ...
  infile_type: fasta
  infiles:
  - /path/to/fasta/human.fasta
  - /path/to/fasta/yeast.fasta
  ...
```

The library in HDF5 (.hdf) format will be saved into
`library:output_folder`. If `library:output_tsv:enabled` is True, a TSV
spectral library that can be processed by DIA-NN and Spectronaut will
also be saved into `library:output_folder`.

------------------------------------------------------------------------

#### transfer

``` bash
peptdeep transfer settings_yaml
```

This command will apply transfer learning to refine RT/CCS/MS2 models
based on `model_mgr:transfer:psm_files` and
`model_mgr:transfer:psm_type`. All yaml settings (exported by
[export-settings](#export-settings)) related to this command are:

```
model_mgr:
  transfer:
    model_output_folder: "{PEPTDEEP_HOME}/refined_models"
    epoch_ms2: 20
    warmup_epoch_ms2: 10
    batch_size_ms2: 512
    lr_ms2: 0.0001
    epoch_rt_ccs: 40
    warmup_epoch_rt_ccs: 10
    batch_size_rt_ccs: 1024
    lr_rt_ccs: 0.0001
    verbose: False
    grid_nce_search: False
    grid_nce_first: 15.0
    grid_nce_last: 45.0
    grid_nce_step: 3.0
    grid_instrument: ['Lumos']
    psm_type: alphapept
    psm_type_choices:
      - alphapept
      - pfind
      - maxquant
      - diann
      - speclib_tsv
    psm_files: []
    ms_file_type: alphapept_hdf
    ms_file_type_choices:
      - alphapept_hdf
      - thermo_raw
      - mgf
      - mzml
    ms_files: []
    psm_num_to_train_ms2: 100000000
    psm_num_per_mod_to_train_ms2: 50
    psm_num_to_test_ms2: 0
    psm_num_to_train_rt_ccs: 100000000
    psm_num_per_mod_to_train_rt_ccs: 50
    psm_num_to_test_rt_ccs: 0
    top_n_mods_to_train: 10
    psm_modification_mapping: {}
    # alphabase modification to modifications of other search engines
    # For example,
    # psm_modification_mapping:
    #   Dimethyl@Any_N-term:
    #     - _(Dimethyl-n-0)
    #     - _(Dimethyl)
    #   Dimethyl:2H(2)@K:
    #     - K(Dimethyl-K-2)
    #   ...
```
For DDA data, peptdeep can also extract MS2 intensities from the
spectrum files from `model_mgr:transfer:ms_files` and
`model_mgr:transfer:ms_file_type` for all PSMs. This will enable the
transfer learning of the MS2 model.

For DIA data, only RT and CCS (if timsTOF) models will be refined.

For example of the settings yaml:

```
model_mgr:
  transfer:
    ...
    psm_type: pfind
    psm_files:
    - /path/to/pFind.spectra
    - /path/to/other/pFind.spectra

    ms_file_type: thermo_raw
    ms_files:
    - /path/to/raw1.raw
    - /path/to/raw2.raw
    ...
```

The refined models will be saved in
`model_mgr:transfer:model_output_folder`. After transfer learning, users
can apply the new models by replacing `model_mgr:external_ms2_model`,
`model_mgr:external_rt_model` and `model_mgr:external_ccs_model` with
the saved `ms2.pth`, `rt.pth` and `ccs.pth` in
`model_mgr:transfer:model_output_folder`. This is useful to perform
sample-specific library prediction.

------------------------------------------------------------------------

#### rescore

This command will apply Percolator to rescore DDA PSMs in
`percolator:input_files:psm_files` and
`percolator:input_files:psm_type`. All yaml settings (exported by
[export-settings](#export-settings)) related to this command are:

```
percolator:
  require_model_tuning: True
  raw_num_to_tune: 8

  require_raw_specific_tuning: True
  raw_specific_ms2_tuning: False
  psm_num_per_raw_to_tune: 200
  epoch_per_raw_to_tune: 5

  multiprocessing: True

  top_k_frags_to_calc_spc: 10
  calibrate_frag_mass_error: False
  max_perc_train_sample: 1000000
  min_perc_train_sample: 100

  percolator_backend: sklearn
  percolator_backend_choices:
    - sklearn
    - pytorch
  percolator_model: linear
  percolator_model_choices:
    pytorch_as_backend:
      - linear # not fully tested, performance may be unstable
      - mlp # not implemented yet
    sklearn_as_backend:
      - linear # logistic regression
      - random_forest
  lr_percolator_torch_model: 0.1 # learning rate, only used when percolator_backend==pytorch
  percolator_iter_num: 5 # percolator iteration number
  cv_fold: 1
  fdr: 0.01
  fdr_level: psm
  fdr_level_choices:
    - psm
    - precursor
    - peptide
    - sequence
  use_fdr_for_each_raw: False
  frag_types: ['b_z1','b_z2','y_z1','y_z2']
  input_files:
    psm_type: alphapept
    psm_type_choices:
      - alphapept
      - pfind
    psm_files: []
    ms_file_type: alphapept_hdf
    ms_file_type_choices:
      - alphapept_hdf
      - thermo_raw # if alpharaw is installed
      - mgf
      - mzml
    ms_files: []
    other_score_column_mapping:
      alphapept: {}
      pfind:
        raw_score: Raw_Score
      msfragger:
        hyperscore: hyperscore
        nextscore: nextscore
      maxquant: {}
  output_folder: "{PEPTDEEP_HOME}/rescore"
```

Transfer learning will be applied when rescoring if `percolator:require_model_tuning`
is True.

The corresponding MS files (`percolator:input_files:ms_files` and
`percolator:input_files:ms_file_type`) must be provided to extract
experimental fragment intensities.

------------------------------------------------------------------------

#### install-models

``` bash
peptdeep install-models [--model-file url_or_local_model_zip] --overwrite True
```

When run for the first time, peptdeep will download and install models
defined in ‘model_url’ in the default yaml settings
from [GitHub](https://github.com/MannLabs/alphapeptdeep/releases/tag/pre-trained-models). This command will
update `pretrained_models.zip` from `--model-file url_or_local_model_zip`.

It is also possible to use other models instead of the pretrained_models by providing `model_mgr:external_ms2_model`,
`model_mgr:external_rt_model` and `model_mgr:external_ccs_model`.

------------------------------------------------------------------------

### Python and Jupyter notebooks

Using peptdeep from Python script or notebook provides the most flexible
way to access all features in peptdeep.

We will introduce several usages of peptdeep via Python notebook:

- [**global_settings**](#global_settings)
- [**Pipeline APIs**](#pipeline-apis)
- [**ModelManager**](#modelmanager)
- [**Library Prediction**](#library-prediction)
- [**DDA Rescoring**](#dda-rescoring)
- [**HLA Peptide Prediction**](#hla-peptide-prediction)

------------------------------------------------------------------------

#### global_settings

Most of the default parameters and attributes peptdeep functions and
classes are controlled by `peptdeep.settings.global_settings` which is a
`dict`.

``` python
from peptdeep.settings import global_settings
```

The default values of `global_settings` is defined in
[default_settings.yaml](https://github.com/MannLabs/alphapeptdeep/blob/main/peptdeep/constants/default_settings.yaml).

#### Pipeline APIs

Pipeline APIs provides the same functionalities with [CLI](#cli),
including [library prediction](#library), [transfer
learning](#transfer), and [rescoring](#rescore).

``` python
from peptdeep.pipeline_api import (
    generate_library,
    transfer_learn,
    rescore,
)
```

All these functionalities take a `settings_dict` as the inputs, the dict
structure is the same as the settings yaml file. See the documatation of `generate_library`, `transfer_learn`, `rescore` in https://alphapeptdeep.readthedocs.io/en/latest/module_pipeline_api.html.

#### ModelManager

``` python
from peptdeep.pretrained_models import ModelManager
```

[`ModelManager`](https://alphapeptdeep.readthedocs.io/en/latest/module_pretrained_models.html#peptdeep.pretrained_models.ModelManager) class is the main entry to access MS2/RT/CCS models. It provides functionalities to train/refine the models and then use the new models to predict the data.

Check [tutorial_model_manager.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/nbs/docs/tutorial_model_manager.ipynb) for details.

#### Library Prediction

``` python
from peptdeep.protein.fasta import PredictSpecLibFasta
```

[`PredictSpecLibFasta`](https://alphapeptdeep.readthedocs.io/en/latest/protein/fasta.html#peptdeep.protein.fasta.PredictSpecLibFasta) class provides functionalities to deal with fasta files or protein
sequences and spectral libraries.

Check out
[tutorial_speclib_from_fasta.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/docs/nbs/tutorial_speclib_from_fasta.ipynb)
for details.

#### DDA Rescoring

``` python
from peptdeep.rescore.percolator import Percolator
```

`Percolator` class provides functionalities to rescore DDA PSMs search by `pFind` and
`AlphaPept`, (and `MaxQuant` if output FDR=100%), …

Check out [test_percolator.ipynb](https://github.com/MannLabs/alphapeptdeep/blob/main/nbs_trials/test_percolator.ipynb)
for details.

#### HLA Peptide Prediction

``` python
from peptdeep.model.model_interface import ModelInterface
import peptdeep.model.generic_property_prediction # model shop
```

Building new DL models for peptide property prediction is one of the key features of AlphaPeptDeep. The key functionalities are [`ModelInterface`](https://alphapeptdeep.readthedocs.io/en/latest/model/model_interface.html#peptdeep.model.model_interface.ModelInterface) and the pre-designed models and model interfaces in the model shop (module [`peptdeep.model.generic_property_prediction`](https://alphapeptdeep.readthedocs.io/en/latest/model/generic_property_prediction.html)).

For example, we can built a HLA classifier that distinguishes HLA peptides from non-HLA peptides, see https://github.com/MannLabs/PeptDeep-HLA for details.

------------------------------------------------------------------------

## Troubleshooting

In case of issues, check out the following:

- [Issues](https://github.com/MannLabs/alphapeptdeep/issues). Try a few
  different search terms to find out if a similar problem has been
  encountered before.

- [Discussions](https://github.com/MannLabs/alphapeptdeep/discussions).
  Check if your problem or feature requests has been discussed before.

------------------------------------------------------------------------

## How to contribute

If you like this software, you can give us a
[star](https://github.com/MannLabs/alphapeptdeep/stargazers) to boost
our visibility! All direct contributions are also welcome. Feel free to
post a new [issue](https://github.com/MannLabs/alphapeptdeep/issues) or
clone the repository and create a [pull
request](https://github.com/MannLabs/alphapeptdeep/pulls) with a new
branch. For an even more interactive participation, check out the
[discussions](https://github.com/MannLabs/alphapeptdeep/discussions) and
the [Contributors License Agreement](misc/CLA.md).

### Notes for developers

#### Tagging of changes
In order to have release notes automatically generated, changes need to be tagged with labels.
The following labels are used (should be safe-explanatory):
`breaking-change`, `bug`, `enhancement`.

#### Release a new version
This package uses a shared release process defined in the
[alphashared](https://github.com/MannLabs/alphashared) repository. Please see the instructions
[there](https://github.com/MannLabs/alphashared/blob/reusable-release-workflow/.github/workflows/README.md#release-a-new-version).

#### pre-commit hooks
It is highly recommended to use the provided pre-commit hooks, as the CI pipeline enforces all checks therein to
pass in order to merge a branch.

The hooks need to be installed once by
```bash
pre-commit install
```
You can run the checks yourself using:
```bash
pre-commit run --all-files
```
------------------------------------------------------------------------

## Changelog

See the [GitHub releases](https://github.com/MannLabs/alphapeptdeep/releases).
for a full overview of the changes made in each version, [CHANGELOG.md](CHANGELOG.md) for older versions.

------------------------------------------------------------------------

## Citation

> **AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics.**
> Wen-Feng Zeng, Xie-Xuan Zhou, Sander Willems, Constantin Ammar, Maria Wahle, Isabell Bludau, Eugenia Voytik, Maximillian T. Strauss & Matthias Mann.
> Nat Commun 13, 7238 (2022), doi: https://doi.org/10.1038/s41467-022-34904-3


------------------------------------------------------------------------

## License

AlphaPeptDeep was developed by the [Mann Labs at the Max Planck
Institute of Biochemistry](https://www.biochem.mpg.de/mann) and the
[University of
Copenhagen](https://www.cpr.ku.dk/research/proteomics/mann/) and is
freely available with an [Apache License](LICENSE.txt). External Python
packages (available in the [requirements](requirements) folder) have
their own licenses, which can be consulted on their respective websites.
