|
|
einverted |
It will find inverted repeats that include a proprtion of mismatches and gaps (bulges in the stem loop).
einverted uses dynamic programming and thus is guaranteed to find the optimal alignment, but is slower than, for example, a self-by-self BLAST. It can find multiple inverted repeats in a sequence.
einverted does not report overlapping matches.
The original "inverted" program was written to annotate the nematode genome. Excluding overlapping repeats saved problems with simple repeat sequences in this genome.
% einverted tembl:d00596 Finds DNA inverted repeats Gap penalty [12]: Minimum score threshold [50]: Match score [3]: Mismatch score [-4]: Sanger Centre program inverted output file [d00596.inv]: File for sequence of regions of inverted repeats. [d00596.fasta]: |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers:
[-sequence] seqall Nucleotide sequence(s) filename and optional
format, or reference (input USA)
-gap integer [12] Gap penalty (Any integer value)
-threshold integer [50] Minimum score threshold (Any integer
value)
-match integer [3] Match score (Any integer value)
-mismatch integer [-4] Mismatch score (Any integer value)
[-outfile] outfile [*.einverted] Sanger Centre program inverted
output file
[-outseq] seqout [
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |
|---|---|---|---|
| [-sequence] (Parameter 1) |
Nucleotide sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
| -gap | Gap penalty | Any integer value | 12 |
| -threshold | Minimum score threshold | Any integer value | 50 |
| -match | Match score | Any integer value | 3 |
| -mismatch | Mismatch score | Any integer value | -4 |
| [-outfile] (Parameter 2) |
Sanger Centre program inverted output file | Output file | <*>.einverted |
| [-outseq] (Parameter 3) |
The sequence of the inverted repeat regions without gap characters. | Writeable sequence | <*>.format |
| Additional (Optional) qualifiers | Allowed values | Default | |
| -maxrepeat | Maximum separation between the start of repeat and the end of the inverted repeat (the default is 2000 bases). | Any integer value | 2000 |
| Advanced (Unprompted) qualifiers | Allowed values | Default | |
| (none) | |||
ID D00596; SV 1; linear; genomic DNA; STD; HUM; 18596 BP.
XX
AC D00596;
XX
DT 17-JUL-1991 (Rel. 28, Created)
DT 14-NOV-2006 (Rel. 89, Last updated, Version 3)
XX
DE Homo sapiens gene for thymidylate synthase, exons 1, 2, 3, 4, 5, 6, 7,
DE complete cds.
XX
KW thymidylate syntase.
XX
OS Homo sapiens (human)
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
OC Homo.
XX
RN [1]
RP 1-18596
RX PUBMED; 2243092.
RA Kaneda S., Nalbantoglu J., Takeishi K., Shimizu K., Gotoh O., Seno T.,
RA Ayusawa D.;
RT "Structural and functional analysis of the human thymidylate synthase
RT gene.";
RL J. Biol. Chem. 265(33):20277-20284(1990).
XX
DR GDB; 163670.
DR GDB; 182340.
XX
CC These data kindly submitted in computer readable form by:
CC Sumiko Kaneda
CC National Institute of Genetics
CC 1111 Yata
CC Mishima 411
CC Japan
CC Phone: +81-559-72-2732
CC Fax: +81-559-71-3651
XX
FH Key Location/Qualifiers
FH
FT source 1..18596
FT /organism="Homo sapiens"
FT /chromosome="18"
FT /map="18p11.32"
FT /mol_type="genomic DNA"
FT /clone="lambdaHTS-1 and lambdaHTS-3"
FT /db_xref="taxon:9606"
FT repeat_unit 1..148
FT /note="Alu sequence"
FT repeat_unit 202..477
[Part of this file has been deleted for brevity]
ttttgttttt agcttcagcg agaacccaga cctttcccaa agctcaggat tcttcgaaaa 15660
gttgagaaaa ttgatgactt caaagctgaa gactttcaga ttgaagggta caatccgcat 15720
ccaactatta aaatggaaat ggctgtttag ggtgctttca aaggagctcg aaggatattg 15780
tcagtcttta ggggttgggc tggatgccga ggtaaaagtt ctttttgctc taaaagaaaa 15840
aggaactagg tcaaaaatct gtccgtgacc tatcagttat taatttttaa ggatgttgcc 15900
actggcaaat gtaactgtgc cagttctttc cataataaaa ggctttgagt taactcactg 15960
agggtatctg acaatgctga ggttatgaac aaagtgagga gaatgaaatg tatgtgctct 16020
tagcaaaaac atgtatgtgc atttcaatcc cacgtactta taaagaaggt tggtgaattt 16080
cacaagctat ttttggaata tttttagaat attttaagaa tttcacaagc tattccctca 16140
aatctgaggg agctgagtaa caccatcgat catgatgtag agtgtggtta tgaactttaa 16200
agttatagtt gttttatatg ttgctataat aaagaagtgt tctgcattcg tccacgcttt 16260
gttcattctg tactgccact tatctgctca gttccttcct aaaatagatt aaagaactct 16320
ccttaagtaa acatgtgctg tattctggtt tggatgctac ttaaaagagt atattttaga 16380
aataatagtg aatatatttt gccctatttt tctcatttta actgcatctt atcctcaaaa 16440
tataatgacc atttaggata gagttttttt tttttttttt taaactttta taaccttaaa 16500
gggttatttt aaaataatct atggactacc attttgccct cattagcttc agcatggtgt 16560
gacttctcta ataatatgct tagattaagc aaggaaaaga tgcaaaacca cttcggggtt 16620
aatcagtgaa atatttttcc cttcgttgca taccagatac ccccggtgtt gcacgactat 16680
ttttattctg ctaatttatg acaagtgtta aacagaacaa ggaattattc caacaagtta 16740
tgcaacatgt tgcttatttt caaattacag tttaatgtct aggtgccagc ccttgatata 16800
gctatttttg taagaacatc ctcctggact ttgggttagt taaatctaaa cttatttaag 16860
gattaagtag gataacgtgc attgatttgc taaaagaatc aagtaataat tacttagctg 16920
attcctgagg gtggtatgac ttctagctga actcatcttg atcggtagga ttttttaaat 16980
ccatttttgt aaaactattt ccaagaaatt ttaagccctt tcacttcaga aagaaaaaag 17040
ttgttggggc tgagcactta attttcttga gcaggaagga gtttcttcca aacttcacca 17100
tctggagact ggtgtttctt tacagattcc tccttcattt ctgttgagta gccgggatcc 17160
tatcaaagac caaaaaaatg agtcctgtta acaaccacct ggaacaaaaa cagattttat 17220
gcatttatgc tgctccaaga aatgctttta cgtctaagcc agaggcaatt aattaatttt 17280
tttttttttg acatggagtc actgtccgtt gcccaggctg cagtgcagtg gcgcaatctt 17340
ggctcactgc aacctccacc tcccaggttc aagtgattct cctgcctcag cctcccatgt 17400
agctgggatc acaggcacct gccaccatgc ccggctaatt ttttgtattt tttgtagaga 17460
cagggtttca ccatgttggc caggctggtc tcaaacacct gacctcaaat gatccacctg 17520
cctcagcctc ccaaagtgtt gggattacag gcgtaagcca ccatgcccag ccctgaatta 17580
atatttttaa aataagtttg gagactgttg gaaataatag ggcagaggaa catattttac 17640
tggctacttg ccagagttag ttaactcatc aaactctttg ataatagttt gacctctgtt 17700
ggtgaaaatg agccatgatc tcttgaacat gatcagaata aatgccccag ccacacaatt 17760
gtagtccaaa ctttttaggt cactaacttg ctagatggtg ccaggttttt ttgcacaagg 17820
agtgcaaatg ttaagatctc cactagtgag gaaaggctag tattacagaa gccttgtcag 17880
aggcaattga acctccaagc cctggccctc aggcctgagg attttgatac agacaaactg 17940
aagaaccgtt tgttagtgga tattgcaaac aaacaggagt caaagcttgg tgctccacag 18000
tctagttcac gagacaggcg tggcagtggc tggcagcatc tcttctcaca ggggccctca 18060
ggcacagctt accttgggag gcatgtagga agcccgctgg atcatcacgg gatacttgaa 18120
atgctcatgc aggtggtcaa catactcaca caccctagga ggagggaatc agatcggggc 18180
aatgatgcct gaagtcagat tattcacgtg gtgctaactt aaagcagaag gagcgagtac 18240
cactcaattg acagtgttgg ccaaggctta gctgtgttac catgcgtttc taggcaagtc 18300
cctaaacctc tgtgcctcag gtccttttct tctaaaatat agcaatgtga ggtggggact 18360
ttgatgacat gaacacacga agtccctctg agaggttttg tggtgccctt taaaagggat 18420
caattcagac tctgtaaata tccagaatta tttgggttcc tctggtcaaa agtcagatga 18480
atagattaaa atcaccacat tttgtgatct atttttcaag aagcgtttgt attttttcat 18540
atggctgcag cagctgccag gggcttgggg tttttttggc aggtagggtt gggagg 18596
//
|
>D00596_13_142 gctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtga gccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaa aaaaaaaaaa >D00596_199_328 ttttttttttttttttttgggacagtcttgctctgtcgcccaggctggagtacaatggtc ggatcttggctcactgcaacctctgcctcccaggttcaagcaattcttctgcctcagcct cccaagtagc >D00596_12128_12301 agaggatttttttttttttttttttttttgagacagagttttgctctgttgcccaggctg gaatgcaacggcgtgatcttggctcactgtaacctctgcctcctgggttcgagtgattct cctgcctcagcctccaagtagctgggattacagcatgtgccaccatgcctggct >D00596_12573_12749 agccaggtgtggtggctcacacctgtaattccaacaactccagaggccaaggcgagagga tcatttgaacccacggaatttgaggctgtagtgagtcatgatcacgccattgcactccat cctgggcaacagagtgagaccctgaatatttaaaaacaacaacaacaacaaaactct >D00596_12246_12296 ctcctgcctcagcctccaagtagctgggattacagcatgtgccaccatgcc >D00596_13886_13938 ggtatggtggctcatgcctgtaatcccagcactttggaagactgagacaggag >D00596_13884_13949 tgggtatggtggctcatgcctgtaatcccagcactttggaagactgagacaggagcaatt gcttga >D00596_14628_14692 tcaagcaattcttctgcctcagcctcccaggtagctgggattacaggcacatgccaccac accca |
D00596: Score 236: 108/130 ( 83%) matches, 0 gaps
13 gctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtgagccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaaaaaaaaaaaa 142
||||| | ||||||||||||| |||||| |||||||| |||||| |||||||||||||||| ||||| ||| || ||||||||||||| || ||||| ||||| | | ||||||||||||||||
328 cgatgaaccctccgactccgtcttcttaacgaacttggaccctccgtctccaacgtcactcggttctaggctggtaacatgaggtcggacccgctgtctcgttctgacagggtttttttttttttttttt 199
D00596: Score 164: 128/174 ( 73%) matches, 3 gaps
12128 agaggatttttttttttttttttttttttgagacagagttttgctctgttgcccaggctggaatgcaacggcgtgatcttggctcactgtaacctctgcctcc-tgggttcgagtgattctcctgcctcagcctc-caagtagctgggattaca-gcatgtgccaccatgcctggct 12301
|||| || || || || || ||||| | ||| || | |||||||||||||| |||| ||||| ||||||||| || |||||| | |||| ||| ||||||| | |||| ||| |||| ||||| ||| | ||| |||||| | || ||||||| |||||||
12749 tctcaaaacaacaacaacaacaaaaatttataagtcccagagtgagacaacgggtcctacctcacgttaccgcactagtactgagtgatgtcggagtttaaggcacccaagtttactaggagagcggaaccggagacctcaacaaccttaatgtccacactcggtggtgtggaccga 12573
D00596: Score 80: 44/51 ( 86%) matches, 2 gaps
12246 ctcctgcctcag-cctccaagtagctgggattaca-gcatgtgccaccatgcc 12296
|||||| ||||| | ||||| |||||||||||| ||||| |||||||| ||
13938 gaggacagagtcagaaggtttcacgaccctaatgtccgtactcggtggtatgg 13886
D00596: Score 99: 53/65 ( 81%) matches, 1 gaps
13884 tgggtatggtggctcatgcctgtaatcccagcactttggaagactgagacaggagcaattgcttga 13949
||||| ||||||| |||||||||||||||| ||| || ||||| ||| || ||||||||||
14692 acccacaccaccgtacacggacattagggtcgatggaccctccgactccgtcttc-ttaacgaact 14628
|
This is not due to a problem with either program. It is simply because some of the shortest repeats that you find with palindrome's default parameter values are below einverted's default cutoff score - you should decrease the 'Minimum score threshold' to see them.
For example, when palindrome is run with 'em:hsfau1', it finds the repeat:
64 aaaactaaggc 74
|||||||||||
98 ttttgattccg 88
einverted will not report this as its score is 33 (11 bases scoring 3 each, no mismatches or gaps) with is below the default score cutoff of 50.
If einverted is run as:
% einverted em:hsfau1 -threshold 33
then it will find it:
Score 33: 11/11 (100%) matches, 0 gaps
64 aaaactaaggc 74
|||||||||||
98 ttttgattccg 88
Anything can be considered to be a repeat if you set the score threshold low enough!
einverted does not report overlapping matches.
The original "inverted" program was written to annotate the nematode genome. Excluding overlapping repeats saved problems with simple repeat sequences in this genome.
| Program name | Description |
|---|---|
| equicktandem | Finds tandem repeats |
| etandem | Looks for tandem repeats in a nucleotide sequence |
| palindrome | Looks for inverted repeats in a nucleotide sequence |
This application was modified for inclusion in EMBOSS by
Peter Rice (pmr © ebi.ac.uk)
Informatics Division, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK