fasta-grep <re> (-dna | -prot) [options]
fasta-grep displays the non-overlapping occurrences of a
PERL regular expression in FASTA sequences.
fasta-grep supports the IUPAC
alphabets for amino acids and nucleotides.
Reads FASTA formatted sequences from standard input.
Writes matches to the regular expression to standard output or, optionally, a file. The output is in the form
FASTA sequence ID line followed by number of matches
[line 1 of sequence]
[match line 1]
[line 2 of sequence]
[match line 2]
...
[last line of sequence]
[last match line]
For proteins, occurrences are marked on the match lines as:
> start of occurrence
< end of occurrence
For DNA, occurrences are marked on the match line as:
> start of occurrence
< end of occurrence
* start and end of two occurrences
| Option | Parameter | Description | Default Behavior |
|---|---|---|---|
| General Options | |||
| -dna | Sequence is DNA. | ||
| -prot | Sequence is protein. | ||
| -s | Print whole matching sequences only. | ||
| -p | Print positions only, not sequence; 1-based. (relative to input strand if DNA; see below) | ||
| -m | Print IDs of matching sequences only. | ||
| -x | Print IDs of non-matching sequences only. | ||
| -o | Print occurrences only in "raw" format. | ||
| -f | Print occurrences only in FASTA format. 1-based positions (relative to the strand of the match if DNA) are appended to the sequence ID. | ||
| -a | Print all occurrences (even overlapping ones); ignored unless -o or -f given. | ||
| -norc | Only print matches to given strand. | print matches for both DNA strands. | |
| -prosite | <re> is in PROSITE format. | print matches for both DNA strands. | |
| -erase | Replace occurrences with 'N's; DNA only. | ||
| -h | Show usage message. | ||
fasta-grep WGATAAN -dna < ~/crp0.s
fasta-grep A[AT]G -dna < ~/crp0.fasta