DFAST Mobile Genetic Element (MGE) / Antimicrobial Resistance (AMR) Annotation Validation Summary

Generated: 2026-06-11  |  DFAST ver. 1.3.9  |  MobileElementFinder 1.1.2 (MGEdb 1.1.1)  |  PlasmidFinder / CARD / VFDB

Validation results from running DFAST on 6 bacterial genomes with the --mge --amr --no_cdd --no_hmm --debug options. --mge enables MGE detection by MobileElementFinder, and --amr enables identification of CARD/VFDB (antimicrobial resistance / virulence factors) and PlasmidFinder (plasmid replicon types). Detected MGEs are written to the GenBank output as features with coordinates.

1. Installation and usage

Installing the tool and reference data

MobileElementFinder (the tool itself) and the MGEdb BLAST index (reference data) can both be prepared with a single command. This is the procedure for non-Docker DFAST installations (it is not needed for the Docker image dfast2026:dev, which already bundles MobileElementFinder). Substitute <DB_ROOT> with your DFAST database directory (e.g. dfast_core/db).

$ dfast_file_downloader.py --mefinder -d <DB_ROOT>

This command runs the following steps in order.

StepDescription
(1) Tool installationOnly if mefinder is not yet installed: fetches the source (bitbucket.org/mhkj/mge_finder), relaxes the biopython upper-bound pin, and runs pip install (so it can coexist with DFAST's biopython 1.87). Skipped if already installed.
(2) Index constructionBuilds the blastn index from the bundled MGEdb (v1.1.1) into <DB_ROOT>/mefinder_db (mge_records.nin / .nhr / .nsq, etc.). File existence is verified after completion.
Adding --no_indexing installs the tool only and skips index construction. If an index already exists, DFAST does not rebuild it at run time (the existing one is reused).

Running DFAST

MGE detection is enabled with the --mge option. It can be combined with --amr.

$ dfast --genome your_genome.fna --mge -o OUTPUT_DIR
# To also perform AMR/VFG and plasmid replicon typing
$ dfast --genome your_genome.fna --mge --amr -o OUTPUT_DIR

By default it runs with --min-coverage 0.95 (stricter than MEF's default of 0.9), and uses the reference index at <DB_ROOT>/mefinder_db. Detected MGEs are written to genome.gbk as features with coordinates (mobile_element / misc_feature), and the findings are written as ## lines in amr_summary.tsv (details in the following sections).

2. Detection results per genome

Species / strainAccession mobile_elementmisc_feature AMR/VFG hitsMGE annotation lines (##)
Escherichia coli K-12 MG1655GCA_000005845.252816960
Klebsiella pneumoniae NTUH-K2044GCA_000009345.178523684
Klebsiella pneumoniae plasmid pOXA-48JN626286.11012
Lactobacillus hokkaidonensis LOOC260GCA_000829395.1255030
Salmonella enterica Typhimurium DT104GCA_000493675.125131628
Vibrio cholerae N16961GCA_000006745.115019315

DFAST finished normally (exit 0) for all 6 genomes. Zero AMR hits for L. hokkaidonensis (food-derived) is reasonable. mobile_element / misc_feature are written to genome.gbk; AMR/VFG hits and MGE annotation lines are written to amr_summary.tsv.

3. How MGEs are recorded in GenBank

The MGE types detected by MobileElementFinder are mapped to features according to the INSDC (DDBJ/GenBank) controlled vocabulary as follows.

MGE typeGenBank feature keyMain qualifier
insertion sequence (IS)mobile_element/mobile_element_type="insertion sequence:name"
MITEmobile_element/mobile_element_type="MITE:name"
unit transposonmobile_element/mobile_element_type="transposon:name"
composite transposon (DB hit)mobile_element/mobile_element_type="transposon:name"
composite transposon (putative = predicted only)misc_feature/note="putative composite transposon (predicted by MobileElementFinder)"
ICE / IME / CIME / mobile insertion cassettemisc_feature/note="type description: name"
integron / superintegronNot detected (MGEdb v1.1.1 contains no integron reference sequences; see the note below)

All features additionally carry /note="MobileElementFinder: name; identity:X%, coverage:Y%; similar to accession (MGEdb)" (the "similar to ..." part is omitted when no accession is available). /note="MGE_n" is DFAST's internal feature ID (standard behavior at output verbosity level 3). The < / > in coordinates are added only when the 5'/3' end is incomplete relative to the reference sequence.

mobile_element example 1 — transposon (pOXA-48 / Tn1999)

     mobile_element  complement(2292..8429)
                     /note="MobileElementFinder: Tn1999; identity:100.0%,
                     coverage:100.0%; similar to AY236073 (MGEdb)"
                     /note="MGE_1"
                     /mobile_element_type="transposon:Tn1999"

mobile_element example 2 — insertion sequence (L. hokkaidonensis / ISLho1)

     mobile_element  9186..10732
                     /note="MobileElementFinder: ISLho1; identity:100.0%,
                     coverage:100.0%"
                     /note="MGE_23"
                     /mobile_element_type="insertion sequence:ISLho1"

misc_feature example 3 — ICE (L. hokkaidonensis / Tn6254)

     misc_feature    1799238..1851240
                     /note="MobileElementFinder: Tn6254; identity:100.0%,
                     coverage:99.1%; similar to AP014680 (MGEdb)"
                     /note="integrative conjugative element (ICE): Tn6254"

misc_feature example 4 — putative composite transposon (E. coli K-12)

     misc_feature    complement(257907..279930)
                     /note="MobileElementFinder: cn_22023_ISSen9;
                     identity:94.0%, coverage:100.0%"
                     /note="putative composite transposon (predicted by MobileElementFinder)"
About integrons / superintegrons: MobileElementFinder's reference database MGEdb v1.1.1 does not include integron reference sequences (the 8 registered types are IS, ICE, transposon, IME, MITE, CIME, and mobile insertion cassette). Therefore the superintegron of V. cholerae N16961 and the class 1 integron within SGI1 of S. enterica DT104 are not detected as "integrons" (the IS elements within them are detected as insertion sequences). This is an inherent limitation of the alignment-based detection method.

Note: classification criteria

Why are ICEs and the like recorded as misc_feature: the INSDC /mobile_element_type is restricted to a controlled vocabulary (insertion sequence, MITE, transposon, integron, retrotransposon, etc.), and there is no value corresponding to ICE / IME / CIME / mobile insertion cassette. Rather than forcing them into other, they are recorded as a misc_feature with a descriptive /note that preserves the information (this also passes DDBJ validation). IS, MITE, and transposon have matching vocabulary terms, so they are output as mobile_element.

Conditions under which a putative composite transposon is output: when the same insertion sequence (IS) occurs in 2 copies on the same contig and the following are satisfied — (1) at least one of them is a confidently detected IS, (2) the inverted repeats (IR) are conserved (terminal truncation ≤ 20 bp and adjacent HSP ≥ 60 bp), and (3) the full length spanning the 2 IS copies is < 52,452 bp. Since there is no known entry in MGEdb and it is inferred from the IS arrangement (evidence = PUTATIVE), it is output as a low-confidence misc_feature with a name of the form cn_<full length>_<IS name> (e.g. cn_22023_ISSen9). The cargo content in between is not considered, so false positives are possible.

4. AMR × MGE context (amr_summary.tsv)

In amr_summary.tsv, the findings from PlasmidFinder and MobileElementFinder are written as ## lines above the antimicrobial resistance / virulence factor hits (tab-separated rows) for each contig. The example below is for pOXA-48, where it is immediately clear that the OXA-48 carbapenemase sits on an IncL plasmid carried on the Tn1999 composite transposon.

#locus  location  hit_accession  gene  product  identity  q_cov  s_cov  e_value  note
## sequence1 PlasmidFinder: IncL, Description: Possibly derived from enterobacteriales IncL type plasmid., ...
## sequence1 MobileElementFinder: Tn1999 (composite transposon), identity:100.0%, coverage:100.0%, 2292..8429
LOCUS_060  sequence1:complement(5445..6242)  ARO:3001782:AAP70012.1  OXA-48  OXA beta-lactamase  100.0  100.0  100.0  0.0  similar to CARD:OXA-48 ...

5. Included files

For each genome, the following 2 files are included, named <species_strain_accession> (12 files in total).