Validation results from running DFAST on 6 bacterial genomes with the
--mge --amr --no_cdd --no_hmm --debug options.
--mge enables MGE detection by MobileElementFinder, and --amr enables identification of
CARD/VFDB (antimicrobial resistance / virulence factors) and PlasmidFinder (plasmid replicon types).
Detected MGEs are written to the GenBank output as features with coordinates.
MobileElementFinder (the tool itself) and the MGEdb BLAST index (reference data) can both be prepared with a single command.
This is the procedure for non-Docker DFAST installations (it is not needed for the Docker image dfast2026:dev,
which already bundles MobileElementFinder).
Substitute <DB_ROOT> with your DFAST database directory (e.g. dfast_core/db).
$ dfast_file_downloader.py --mefinder -d <DB_ROOT>
This command runs the following steps in order.
| Step | Description |
|---|---|
| (1) Tool installation | Only if mefinder is not yet installed: fetches the source (bitbucket.org/mhkj/mge_finder), relaxes the biopython upper-bound pin, and runs pip install (so it can coexist with DFAST's biopython 1.87). Skipped if already installed. |
| (2) Index construction | Builds the blastn index from the bundled MGEdb (v1.1.1) into <DB_ROOT>/mefinder_db (mge_records.nin / .nhr / .nsq, etc.). File existence is verified after completion. |
--no_indexing installs the tool only and skips index construction.
If an index already exists, DFAST does not rebuild it at run time (the existing one is reused).
MGE detection is enabled with the --mge option. It can be combined with --amr.
$ dfast --genome your_genome.fna --mge -o OUTPUT_DIR # To also perform AMR/VFG and plasmid replicon typing $ dfast --genome your_genome.fna --mge --amr -o OUTPUT_DIR
By default it runs with --min-coverage 0.95 (stricter than MEF's default of 0.9), and uses the reference index at
<DB_ROOT>/mefinder_db. Detected MGEs are written to genome.gbk as features with coordinates
(mobile_element / misc_feature), and the findings are written as
## lines in amr_summary.tsv (details in the following sections).
| Species / strain | Accession | mobile_element | misc_feature | AMR/VFG hits | MGE annotation lines (##) |
|---|---|---|---|---|---|
| Escherichia coli K-12 MG1655 | GCA_000005845.2 | 52 | 8 | 169 | 60 |
| Klebsiella pneumoniae NTUH-K2044 | GCA_000009345.1 | 78 | 5 | 236 | 84 |
| Klebsiella pneumoniae plasmid pOXA-48 | JN626286.1 | 1 | 0 | 1 | 2 |
| Lactobacillus hokkaidonensis LOOC260 | GCA_000829395.1 | 25 | 5 | 0 | 30 |
| Salmonella enterica Typhimurium DT104 | GCA_000493675.1 | 25 | 1 | 316 | 28 |
| Vibrio cholerae N16961 | GCA_000006745.1 | 15 | 0 | 193 | 15 |
The MGE types detected by MobileElementFinder are mapped to features according to the INSDC (DDBJ/GenBank) controlled vocabulary as follows.
| MGE type | GenBank feature key | Main qualifier |
|---|---|---|
| insertion sequence (IS) | mobile_element | /mobile_element_type="insertion sequence:name" |
| MITE | mobile_element | /mobile_element_type="MITE:name" |
| unit transposon | mobile_element | /mobile_element_type="transposon:name" |
| composite transposon (DB hit) | mobile_element | /mobile_element_type="transposon:name" |
| composite transposon (putative = predicted only) | misc_feature | /note="putative composite transposon (predicted by MobileElementFinder)" |
| ICE / IME / CIME / mobile insertion cassette | misc_feature | /note="type description: name" |
| integron / superintegron | Not detected (MGEdb v1.1.1 contains no integron reference sequences; see the note below) | |
mobile_element complement(2292..8429) /note="MobileElementFinder: Tn1999; identity:100.0%, coverage:100.0%; similar to AY236073 (MGEdb)" /note="MGE_1" /mobile_element_type="transposon:Tn1999"
mobile_element 9186..10732 /note="MobileElementFinder: ISLho1; identity:100.0%, coverage:100.0%" /note="MGE_23" /mobile_element_type="insertion sequence:ISLho1"
misc_feature 1799238..1851240 /note="MobileElementFinder: Tn6254; identity:100.0%, coverage:99.1%; similar to AP014680 (MGEdb)" /note="integrative conjugative element (ICE): Tn6254"
misc_feature complement(257907..279930) /note="MobileElementFinder: cn_22023_ISSen9; identity:94.0%, coverage:100.0%" /note="putative composite transposon (predicted by MobileElementFinder)"
Why are ICEs and the like recorded as misc_feature:
the INSDC /mobile_element_type is restricted to a controlled vocabulary (insertion sequence, MITE, transposon, integron, retrotransposon, etc.),
and there is no value corresponding to ICE / IME / CIME / mobile insertion cassette.
Rather than forcing them into other, they are recorded as a misc_feature with a descriptive /note that preserves the information (this also passes DDBJ validation).
IS, MITE, and transposon have matching vocabulary terms, so they are output as mobile_element.
Conditions under which a putative composite transposon is output:
when the same insertion sequence (IS) occurs in 2 copies on the same contig and the following are satisfied —
(1) at least one of them is a confidently detected IS,
(2) the inverted repeats (IR) are conserved (terminal truncation ≤ 20 bp and adjacent HSP ≥ 60 bp), and
(3) the full length spanning the 2 IS copies is < 52,452 bp.
Since there is no known entry in MGEdb and it is inferred from the IS arrangement (evidence = PUTATIVE), it is output as a low-confidence misc_feature
with a name of the form cn_<full length>_<IS name> (e.g. cn_22023_ISSen9). The cargo content in between is not considered, so false positives are possible.
In amr_summary.tsv, the findings from PlasmidFinder and MobileElementFinder are written as ## lines
above the antimicrobial resistance / virulence factor hits (tab-separated rows) for each contig.
The example below is for pOXA-48, where it is immediately clear that the OXA-48 carbapenemase sits on an IncL plasmid carried on the Tn1999 composite transposon.
#locus location hit_accession gene product identity q_cov s_cov e_value note ## sequence1 PlasmidFinder: IncL, Description: Possibly derived from enterobacteriales IncL type plasmid., ... ## sequence1 MobileElementFinder: Tn1999 (composite transposon), identity:100.0%, coverage:100.0%, 2292..8429 LOCUS_060 sequence1:complement(5445..6242) ARO:3001782:AAP70012.1 OXA-48 OXA beta-lactamase 100.0 100.0 100.0 0.0 similar to CARD:OXA-48 ...
For each genome, the following 2 files are included, named <species_strain_accession> (12 files in total).