MS-GF+

MS-GF+ Documentation home

MS-GF+

(How to migrate from MS-GFDB to MS-GF)

ChangeLog

Usage: java -Xmx3500M -jar MSGFPlus.jar
	-s SpectrumFile (*.mzML, *.mzXML, *.mgf, *.ms2, *.pkl or *_dta.txt)
	   Spectra should be centroided. Profile spectra will be ignored.
	-d DatabaseFile (*.fasta or *.fa)
	[-o OutputFile (*.mzid)] (Default: [SpectrumFileName].mzid)
	[-t PrecursorMassTolerance] (e.g. 2.5Da, 20ppm or 0.5Da,2.5Da, Default: 20ppm)
	   Use comma to set asymmetric values. E.g. "-t 0.5Da,2.5Da" will set 0.5Da to the minus (expMass<theoMass) and 2.5Da to plus (expMass>theoMass)
	[-ti IsotopeErrorRange] (Range of allowed isotope peak errors, Default:0,1)
	   Takes into account of the error introduced by chooosing a non-monoisotopic peak for fragmentation.
	   On Windows, put the range inside "" (e.g. "0,1").
	   The combination of -t and -ti determines the precursor mass tolerance.
	   E.g. "-t 20ppm -ti -1,2" tests abs(exp-calc-n*1.00335Da)<20ppm for n=-1, 0, 1, 2.
	[-thread NumThreads] (Number of concurrent threads to be executed, Default: Number of available cores)
	[-tasks NumTasks] (Override the number of tasks to use on the threads, Default: (internally calculated based on inputs))
	   More tasks than threads will reduce the memory requirements of the search, but will be slower (how much depends on the inputs).
	   1<=tasks<=numThreads: will create one task per thread, which is the original behavior.
	   tasks=0: use default calculation - minimum of: (threads*3) and (numSpectra/250).
	   tasks<0: multiply number of threads by abs(tasks) to determine number of tasks (i.e., -2 => "2 * numThreads" tasks).
	   One task per thread will use the most memory, but will usually finish the fastest.
	   2-3 tasks per thread will use comparably less memory, but may cause the search to take 1.5 to 2 times as long.
	[-verbose 0/1] (0: report total progress only (Default), 1: report total and per-thread progress/status)
	[-tda 0/1] (0: don't search decoy database (Default), 1: search decoy database)
	[-m FragmentMethodID] (0: As written in the spectrum or CID if no info (Default), 1: CID, 2: ETD, 3: HCD, 4: UVPD)
	[-inst InstrumentID] (0: Low-res LCQ/LTQ (Default), 1: Orbitrap/FTICR, 2: TOF, 3: Q-Exactive)
	[-e EnzymeID] (0: unspecific cleavage, 1: Trypsin (Default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: glutamyl endopeptidase, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: no cleavage)
	[-protocol ProtocolID] (0: Automatic (Default), 1: Phosphorylation, 2: iTRAQ, 3: iTRAQPhospho, 4: TMT, 5: Standard)
	[-ntt 0/1/2] (Number of Tolerable Termini, Default: 2)
	   E.g. For trypsin, 0: non-tryptic, 1: semi-tryptic, 2: fully-tryptic peptides only.
	[-mod ModificationFileName] (Modification file, Default: standard amino acids with fixed C+57)
	[-minLength MinPepLength] (Minimum peptide length to consider, Default: 6)
	[-maxLength MaxPepLength] (Maximum peptide length to consider, Default: 40)
	[-minCharge MinCharge] (Minimum precursor charge to consider if charges are not specified in the spectrum file, Default: 2)
	[-maxCharge MaxCharge] (Maximum precursor charge to consider if charges are not specified in the spectrum file, Default: 3)
	[-n NumMatchesPerSpec] (Number of matches per spectrum to be reported, Default: 1)
	[-addFeatures 0/1] (0: output basic scores only (Default), 1: output additional features)
	[-ccm ChargeCarrierMass] (Mass of charge carrier, Default: mass of proton (1.00727649))
Example (high-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzXML -d IPI_human_3.79.fasta -t 20ppm -ti -1,2 -ntt 2 -tda 1 -o testMSGFPlus.mzid
Example (low-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzXML -d IPI_human_3.79.fasta -t 0.5Da,2.5Da -ntt 2 -tda 1 -o testMSGFPlus.mzid

Parameters:

If multiple MS-GF+ processes access the same database file, it is strongly recommended to index the database prior to the database search by running BuildSA (see below).

If -tda 1 is specified, MS-GF+ automatically creates a combined target/reversed database file (DBFileName.revConcat.fasta). Thus, when specifying "-d" parameter, DatabaseFile must contain only target proteins.

The meaning and the default value have changed as of version 8442 (Sept. 2012).

MS-GF+ output

MS-GF+ outputs results as an mzIdentML (version 1.1) file. See http://www.psidev.info/mzidentml/ for details on the mzIdentML format. For every PSM, MS-GF+ reports the following scores:

MS-GF+ output example

Using MzIDToTsv one can convert MS-GF+ output (*.mzid) into the tsv format

#SpecFile SpecID ScanNum FragMethod Precursor IsotopeError PrecursorError(ppm) Charge Peptide Protein DeNovoScore MSGFScore SpecEValue EValue QValue PepQValue
test.mgf index=0 26559 CID 1285.3457 1 -5.049801 3 K.IGAYLFVDMAHVAGLIAAGVYPNPVPHAHVVTSTTHK.T test 299 244 1.4807088E-31 3.2871733E-29 0.0 0.0
test.mgf index=0 26559 CID 1285.3457 1 -5.049801 3 K.IGAYLFVDMAHVAGLIAAGVYPNPVPHAHVVTSTTHK.T test_isoform 299 244 1.4807088E-31 3.2871733E-29 0.0 0.0
test.mgf index=1 -1 CID 870.11743 0 0.14029178 3 K.NLANPTSVILASIQM+15.995LEYLGMADK.A test2 156 136 2.2559852E-22 4.4217308E-20 0.0 0.0