|
|
pepwindow |
% pepwindow tsw:hba_human Displays protein hydropathy Graph type [x11]: cps Created pepwindow.ps |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers:
[-sequence] sequence Protein sequence filename and optional
format, or reference (input USA)
-graph xygraph [$EMBOSS_GRAPHICS value, or x11] Graph type
(ps, hpgl, hp7470, hp7580, meta, cps, x11,
tekt, tek, none, data, xterm, png, gif)
Additional (Optional) qualifiers:
-datafile datafile [Enakai.dat] AAINDEX entry data file
-length integer [7] Window size (Integer from 1 to 200)
Advanced (Unprompted) qualifiers: (none)
Associated qualifiers:
"-sequence" associated qualifiers
-sbegin1 integer Start of the sequence to be used
-send1 integer End of the sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-graph" associated qualifiers
-gprompt boolean Graph prompting
-gdesc string Graph description
-gtitle string Graph title
-gsubtitle string Graph subtitle
-gxtitle string Graph x axis title
-gytitle string Graph y axis title
-goutfile string Output file for non interactive displays
-gdirectory string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |
|---|---|---|---|
| [-sequence] (Parameter 1) |
Protein sequence filename and optional format, or reference (input USA) | Readable sequence | Required |
| -graph | Graph type | EMBOSS has a list of known devices, including ps, hpgl, hp7470, hp7580, meta, cps, x11, tekt, tek, none, data, xterm, png, gif | EMBOSS_GRAPHICS value, or x11 |
| Additional (Optional) qualifiers | Allowed values | Default | |
| -datafile | AAINDEX entry data file | Data file | Enakai.dat |
| -length | Window size | Integer from 1 to 200 | 7 |
| Advanced (Unprompted) qualifiers | Allowed values | Default | |
| (none) | |||
ID HBA_HUMAN Reviewed; 142 AA.
AC P69905; P01922; Q96KF1; Q9NYR7;
DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot.
DT 23-JAN-2007, sequence version 2.
DT 03-APR-2007, entry version 41.
DE Hemoglobin subunit alpha (Hemoglobin alpha chain) (Alpha-globin).
GN Name=HBA1;
GN and
GN Name=HBA2;
OS Homo sapiens (Human).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC Catarrhini; Hominidae; Homo.
OX NCBI_TaxID=9606;
RN [1]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] (HBA1).
RX MEDLINE=81088339; PubMed=7448866; DOI=10.1016/0092-8674(80)90347-5;
RA Michelson A.M., Orkin S.H.;
RT "The 3' untranslated regions of the duplicated human alpha-globin
RT genes are unexpectedly divergent.";
RL Cell 22:371-377(1980).
RN [2]
RP NUCLEOTIDE SEQUENCE [MRNA] (HBA2).
RX MEDLINE=80137531; PubMed=6244294;
RA Wilson J.T., Wilson L.B., Reddy V.B., Cavallesco C., Ghosh P.K.,
RA Deriel J.K., Forget B.G., Weissman S.M.;
RT "Nucleotide sequence of the coding portion of human alpha globin
RT messenger RNA.";
RL J. Biol. Chem. 255:2807-2815(1980).
RN [3]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] (HBA2).
RX MEDLINE=81175088; PubMed=6452630;
RA Liebhaber S.A., Goossens M.J., Kan Y.W.;
RT "Cloning and complete nucleotide sequence of human 5'-alpha-globin
RT gene.";
RL Proc. Natl. Acad. Sci. U.S.A. 77:7054-7058(1980).
RN [4]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RX PubMed=6946451;
RA Orkin S.H., Goff S.C., Hechtman R.L.;
RT "Mutation in an intervening sequence splice junction in man.";
RL Proc. Natl. Acad. Sci. U.S.A. 78:5041-5045(1981).
RN [5]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA], AND VARIANT LYS-32.
RX MEDLINE=21303311; PubMed=11410421;
RA Zhao Y., Xu X.;
RT "Alpha2(CD31 AGG-->AAG, Arg-->Lys) causing non-deletional alpha-
RT thalassemia in a Chinese family with HbH disease.";
RL Haematologica 86:541-542(2001).
RN [6]
[Part of this file has been deleted for brevity]
FT /FTId=VAR_002840.
FT VARIANT 131 131 A -> D (in Yuda; O(2) affinity down).
FT /FTId=VAR_002842.
FT VARIANT 131 131 A -> P (in Sun Prairie; unstable).
FT /FTId=VAR_002841.
FT VARIANT 132 132 S -> P (in Questembert; highly unstable;
FT causes alpha-thalassemia).
FT /FTId=VAR_002843.
FT VARIANT 134 134 S -> R (in Val de Marne; O(2) affinity
FT up).
FT /FTId=VAR_002844.
FT VARIANT 136 136 V -> E (in Pavie).
FT /FTId=VAR_002845.
FT VARIANT 137 137 L -> M (in Chicago).
FT /FTId=VAR_002846.
FT VARIANT 137 137 L -> P (in Bibba; unstable; causes alpha-
FT thalassemia).
FT /FTId=VAR_002847.
FT VARIANT 139 139 S -> P (in Attleboro; O(2) affinity up).
FT /FTId=VAR_002848.
FT VARIANT 140 140 K -> E (in Hanamaki; O(2) affinity up).
FT /FTId=VAR_002849.
FT VARIANT 140 140 K -> T (in Tokoname; O(2) affinity up).
FT /FTId=VAR_002850.
FT VARIANT 141 141 Y -> H (in Rouen; O(2) affinity up).
FT /FTId=VAR_002851.
FT VARIANT 142 142 R -> C (in Nunobiki; O(2) affinity up).
FT /FTId=VAR_002852.
FT VARIANT 142 142 R -> H (in Suresnes; O(2) affinity up).
FT /FTId=VAR_002854.
FT VARIANT 142 142 R -> L (in Legnano; O(2) affinity up).
FT /FTId=VAR_002853.
FT VARIANT 142 142 R -> P (in Singapore).
FT /FTId=VAR_002855.
FT HELIX 4 15
FT HELIX 16 20
FT HELIX 21 35
FT HELIX 37 42
FT HELIX 53 71
FT HELIX 73 75
FT HELIX 76 79
FT HELIX 81 89
FT HELIX 96 112
FT TURN 114 116
FT HELIX 119 136
FT TURN 137 139
SQ SEQUENCE 142 AA; 15258 MW; 15E13666573BBBAE CRC64;
MVLSPADKTN VKAAWGKVGA HAGEYGAEAL ERMFLSFPTT KTYFPHFDLS HGSAQVKGHG
KKVADALTNA VAHVDDMPNA LSALSDLHAH KLRVDPVNFK LLSHCLLVTL AAHLPAEFTP
AVHASLDKFL ASVSTVLTSK YR
//
|
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
pepwindow reads the Kyte-Doolittle hydropathy data from the file 'Enakai.dat'
The EMBOSS data file 'Enakai.dat' contains :-
D Hydropathy index (Kyte-Doolittle, 1982)
R 0807099
A Kyte, J. and Doolittle, R.F.
T A simple method for displaying the hydropathic character of a protein
J J. Mol. Biol. 157, 105-132 (1982)
C CHOC760103 0.964 JANJ780102 0.922 DESM900102 0.898
EISD860103 0.897 CHOC760104 0.889 WOLR810101 0.885
RADA880101 0.884 MANP780101 0.881 EISD840101 0.878
PONP800103 0.870 NAKH920108 0.868 JANJ790101 0.867
JANJ790102 0.866 PONP800102 0.861 MEIH800103 0.856
PONP800101 0.851 PONP800108 0.850 WARP780101 0.845
RADA880108 0.842 ROSG850102 0.841 DESM900101 0.837
BIOV880101 0.829 RADA880107 0.828 LIFS790102 0.824
KANM800104 0.824 CIDH920104 0.824 MIYS850101 0.821
RADA880104 0.819 NAKH900111 0.817 NISK800101 0.812
FAUJ830101 0.811 ARGP820103 0.806 NAKH920105 0.803
ARGP820102 0.803 KRIW790101 -0.805 CHOC760102 -0.838
GUYH850101 -0.843 RACS770102 -0.844 JANJ780103 -0.845
ROSM880101 -0.845 PRAM900101 -0.850 JANJ780101 -0.852
GRAR740102 -0.859 MEIH800102 -0.871 ROSM880102 -0.878
OOBM770101 -0.899
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.5
3.8 -3.9 1.9 2.8 -1.6 -0.8 -0.7 -0.9 -1.3 4.2
//
pepwindow can use any of the "Nakai et al." database of amino acid parameters - these used to be in a database called "NAKAI" but are now in one called "AAINDEX". EMBOSS has a program aaindexextract that takes data from this database and makes it available for pepwindow.
1. FTP the AAINDEX database from Japan:
ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex/aaindex1
2. Run aaindexextract with the aaindex1 file as input (or ask whoever installs EMBOSS to run it)
3. Run pepwindow with -datafile specifying the name of whatever "AAINDEX" datafile you wish to use. (Use embossdata -showall to see your available "AAINDEX" data file names.)
Kyte, J. and Doolittle, R.F. A simple method for displaying the hydropathic character of a protein J. Mol. Biol. 157, 105-132 (1982)
| Program name | Description |
|---|---|
| backtranambig | Back translate a protein sequence to ambiguous codons |
| backtranseq | Back translate a protein sequence |
| charge | Protein charge plot |
| checktrans | Reports STOP codons and ORF statistics of a protein |
| compseq | Count composition of dimer/trimer/etc words in a sequence |
| emowse | Protein identification by mass spectrometry |
| freak | Residue/base frequency table or plot |
| iep | Calculates the isoelectric point of a protein |
| mwcontam | Shows molwts that match across a set of files |
| mwfilter | Filter noisy molwts from mass spec output |
| octanol | Displays protein hydropathy |
| pepinfo | Plots simple amino acid properties in parallel |
| pepstats | Protein statistics |
| pepwindowall | Displays protein hydropathy of a set of sequences |
Based on the original program by Jack Kyte and Russell F. Doolittle.