{% extends "base.j2.html" %} {% block title %}GEMINI query interface{% endblock %} {% block head %}{% endblock %} {% block body %}
| column_name | type | notes |
|---|---|---|
| chrom | STRING | The chromosome on which the variant resides |
| start | INTEGER | The 0-based start position. |
| end | INTEGER | The 1-based end position. |
| variant_id | INTEGER | PRIMARY_KEY |
| anno_id | INTEGER | Variant transcript number for the most severely affected transcript |
| ref | STRING | Reference allele |
| alt | STRING | Alternate alele for the variant |
| qual | INTEGER | Quality score for the assertion made in ALT |
| filter | STRING | A string of filters passed/failed in variant calling |
| type | STRING | The type of variant.
Any of: [snp, indel]
|
| sub_type | STRING | The variant sub-type.
If type is snp: [ts, (transition), tv (transversion)]
If type is indel: [ins, (insertion), del (deletion)]
|
| call_rate | FLOAT | The fraction of samples with a valid genotype |
| num_hom_ref | INTEGER | The total number of of homozygotes for the reference (ref) allele |
| num_het | INTEGER | The total number of heterozygotes observed. |
| num_hom_alt | INTEGER | The total number of homozygotes for the reference (alt) allele |
| num_unknown | INTEGER | The total number of of unknown genotypes |
| aaf | FLOAT | The observed allele frequency for the alternate allele |
| hwe | FLOAT | The Chi-square probability of deviation from HWE (assumes random mating) |
| inbreeding_coeff | FLOAT | The inbreeding co-efficient that expresses the likelihood of effects due to inbreeding |
| pi | FLOAT | The computed nucleotide diversity (pi) for the site |
| gts | BLOB | A compressed binary vector of sample genotypes
(e.g., “A/A”, “A|G”, “G/G”)
Access a specific sample's genotype with gts.sample_id
|
| gt_types | BLOB | A compressed binary vector of numeric genotype “types”
(e.g., 0, 1, 2)
Access a specific sample's genotype type with gt_types.sample_id
|
| gt_phases | BLOB | A compressed binary vector of sample genotype phases
(e.g., False, True, False)
Access a specific sample's genotype phasing info with gt_phases.sample_id
|
| gt_depths | BLOB | A compressed binary vector of the depth of aligned sequence observed for each sample
Access a specific sample's sequence depth info with gt_depths.sample_id
|
| gene | STRING | Corresponding gene name of the highly affected transcript |
| transcript | STRING | The variant transcript that was most severely affected
(for two equally affected transcripts, either the first
| one is selected (VEP) or protein_coding biotype considered (snpEff).
|
| is_exonic | BOOL | Does the variant affect an exon for >= 1transcript? |
| is_coding | BOOL | Does the variant fall in a coding region (excl. 3’ & 5’ UTRs) for >= 1 transcript? |
| is_lof | BOOL | Based on the value of the impact col, is the variant LOF for >= transcript? |
| exon | STRING | Exon information for the severely affected transcript |
| codon_change | STRING | What is the codon change? |
| aa_change | STRING | What is the amino acid change (for an snp)? |
| aa_length | STRING | The length of CDS in terms of number of amino acids |
| biotype | STRING | The ‘type’ of the severely affected transcript (e.g.protein-coding, pseudogene, rRNA etc.) |
| impact | STRING | The consequence of the most severely affected transcript |
| impact_severity | STRING | Severity of the highest order observed for the variant |
| polyphen_pred | STRING | Polyphen predictions for the snps (only with VEP) for the severely affected transcript |
| polyphen_score | FLOAT | Polyphen scores for the severely affected transcript |
| sift_pred | STRING | SIFT predictions for the snp’s (VEP only) for the most severely affected transcript |
| sift_score | FLOAT | SIFT scores for the predictions |
| pfam_domain | STRING | Pfam protein domain that the variant affects |
| anc_allele | STRING | The reported ancestral allele if there is one. |
| rms_bq | FLOAT | The RMS base quality at this position. |
| cigar | STRING | CIGAR string describing how to align an alternate allele to the reference allele. |
| depth | INTEGER | The number of aligned sequence reads that led to this variant call |
| strand_bias | FLOAT | Strand bias at the variant position |
| rms_map_qual | FLOAT | RMS mapping quality, a measure of variance of quality scores |
| in_hom_run | INTEGER | Homopolymer runs for the variant allele |
| num_mapq_zero | INTEGER | Total counts of reads with mapping quality equal to zero |
| num_alleles | INTEGER | Total number of alleles in called genotypes |
| num_reads_w_dels | FLOAT | Fraction of reads with spanning deletions |
| haplotype_score | FLOAT | Consistency of the site with two segregating haplotypes |
| qual_depth | FLOAT | Variant confidence or quality by depth |
| allele_count | INTEGER | Allele counts in genotypes |
| allele_bal | FLOAT | Allele balance for hets |
| is_somatic | BOOL | Whether the variant is somatically acquired. |
| in_dbsnp | BOOL | Is this variant found in dbSnp (build 135)?
0 : Absence of the variant in dbsnp
1 : Presence of the variant in dbsnp
|
| rs_ids | STRING | A comma-separated list of rs ids for variants present in dbsnp
|
| in_hm2 | BOOL | Whether the variant was part of HapMap2. |
| in_hm3 | BOOL | Whether the variant was part of HapMap3. |
| in_esp | BOOL | Presence/absence of the variant in the ESP project data |
| in_1kg | BOOL | Presence/absence of the variant in the 1000 genome project data |
| aaf_esp_ea | FLOAT | Minor Allele Frequency of the variant for European Americans in the ESP project |
| aaf_esp_aa | FLOAT | Minor Allele Frequency of the variant for African Americans in the ESP project |
| aaf_esp_all | FLOAT | Minor Allele Frequency of the variant w.r.t both groups in the ESP project |
| aaf_1kg_amr | FLOAT | Allele Frequency of the variant for samples in AMR based on AC/AN (1000g project) |
| aaf_1kg_asn | FLOAT | Allele frequency of the variant for samples in ASN based on AC/AN (1000g project) |
| aaf_1kg_afr | FLOAT | Allele frequency of the variant for samples in AFR based on AC/AN (1000g project) |
| aaf_1kg_eur | FLOAT | Allele Frequency of the variant for samples in EUR based on AC/AN (1000g project) |
| aaf_1kg_all | FLOAT | Global allele frequency (based on AC/AN) (1000g project) |
| in_omim | BOOL | 0 : Absence of the variant in OMIM database
1 : Presence of the variant in OMIM database
|
| clinvar_sig | STRING | The clinical significance scores for each
of the variant according to ClinVar:
unknown, untested, non-pathogenic
probable-non-pathogenic, probable-pathogenic
pathogenic, drug-response, histocompatibility
other
|
| clinvar_disease_name | STRING | The name of the disease to which the variant is relevant |
| clinvar_dbsource | STRING | Variant Clinical Channel IDs |
| clinvar_dbsource_id | STRING | The record id in the above database |
| clinvar_origin | STRING | The type of variant.
Any of:
unknown, germline, somatic,
inherited, paternal, maternal,
de-novo, biparental, uniparental,
not-tested, tested-inconclusive,
other
|
| clinvar_dsdb | STRING | Variant disease database name |
| clinvar_dsdbid | STRING | Variant disease database ID |
| clinvar_disease_acc | STRING | Variant Accession and Versions |
| clinvar_in_locus_spec_db | BOOL | Submitted from a locus-specific database? |
| clinvar_on_diag_assay | BOOL | Variation is interrogated in a clinical diagnostic assay? |
| exome_chip | BOOL | Whether an SNP is on the Illumina HumanExome Chip |
| cyto_band | STRING | Chromosomal cytobands that a variant overlaps |
| rmsk | STRING | A comma-separated list of RepeatMasker annotations that the variant overlaps.
Each hit is of the form: name_class_family
|
| in_cpg_island | BOOL | Does the variant overlap a CpG island?.
Based on UCSC: Regulation > CpG Islands > cpgIslandExt
|
| in_segdup | BOOL | Does the variant overlap a segmental duplication?.
Based on UCSC: Variation&Repeats > Segmental Dups > genomicSuperDups track
|
| is_conserved | BOOL | Does the variant overlap a conserved region?
Based on the 29-way mammalian conservation study
|
| gerp_bp_score | FLOAT | GERP conservation score.
Only populated if the --load-gerp-bp option is used when loading.
Higher scores reflect greater conservation. At base-pair resolution.
|
| gerp_element_pval | FLOAT | GERP elements P-val
Lower P-values scores reflect greater conservation. Not at base-pair resolution.
|
| recomb_rate | FLOAT | Returns the mean recombination rate at the variant site
Based on HapMapII_GRCh37 genetic map
|
| grc | STRING | Association with patch and fix regions from the Genome Reference Consortium:
Identifies potential problem regions associated with variant calls.
Built with annotation_provenance/make-ncbi-grc-patches.py
|
| gms_illumina | FLOAT | Genome Mappability Scores (GMS) for Illumina error models
Provides low GMS scores (< 25.0 in any technology) from:
#Download_GMS_by_Chromosome_and_Sequencing_Technology
Input VCF for annotations prepared with:
|
| gms_solid | FLOAT | Genome Mappability Scores with SOLiD error models |
| gms_iontorrent | FLOAT | Genome Mappability Scores with IonTorrent error models |
| in_cse | BOOL | Is a variant in an error prone genomic position,
using CSE: Context-Specific Sequencing Errors
|
| encode_tfbs | STRING | Comma-separated list of transcription factors that were
observed by ENCODE to bind DNA in this region. Each hit in the list is constructed
as TF_CELLCOUNT, where:
TF is the transcription factor name
CELLCOUNT is the number of cells tested that had nonzero signals.
Provenance: wgEncodeRegTfbsClusteredV2 UCSC table
|
| encode_dnaseI_cell_count | INTEGER | Count of cell types that were observed to have DnaseI hypersensitivity.
|
| encode_dnaseI_cell_list | STRING | Comma separated list of cell types that were observed to have DnaseI hypersensitivity.
Provenance: Thurman, et al, Nature, 489, pp. 75-82, 5 Sep. 2012
|
| encode_consensus_gm12878 | STRING | ENCODE consensus segmentation prediction for GM12878.
CTCF: CTCF-enriched element
E: Predicted enhancer
PF: Predicted promoter flanking region
R: Predicted repressed or low-activity region
TSS: Predicted promoter region including TSS
T: Predicted transcribed region
WE: Predicted weak enhancer or open chromatin cis-regulatory element
| unknown: This region of the genome had no functional prediction.
|
| encode_consensus_h1hesc | STRING | ENCODE consensus segmentation prediction for h1HESC. See encode_consseg_gm12878 for details. |
| encode_consensus_helas3 | STRING | ENCODE consensus segmentation prediction for Helas3. See encode_consseg_gm12878 for details. |
| encode_consensus_hepg2 | STRING | ENCODE consensus segmentation prediction for HEPG2. See encode_consseg_gm12878 for details. |
| encode_consensus_huvec | STRING | ENCODE consensus segmentation prediction for HuVEC. See encode_consseg_gm12878 for details. |
| encode_consensus_k562 | STRING | ENCODE consensus segmentation prediction for k562. See encode_consseg_gm12878 for details. |
| column_name | type | notes |
|---|---|---|
| variant_id | INTEGER | PRIMARY_KEY (Foreign key to variants table) |
| anno_id | INTEGER | PRIMARY_KEY (Based on variant transcripts) |
| gene | STRING | The gene affected by the variant. |
| transcript | STRING | The transcript affected by the variant. |
| is_exonic | BOOL | Does the variant affect an exon for this transcript? |
| is_coding | BOOL | Does the variant fall in a coding region (excludes 3’ & 5’ UTR’s of exons)? |
| is_lof | BOOL | Based on the value of the impact col, is the variant LOF? |
| exon | STRING | Exon information for the variants that are exonic |
| codon_change | STRING | What is the codon change? |
| aa_change | STRING | What is the amino acid change? |
| aa_length | STRING | The length of CDS in terms of number of amino acids |
| biotype | STRING | The type of transcript (e.g.protein-coding, pseudogene, rRNA etc.) |
| impact | STRING | Impacts due to variation (ref.impact category) |
| impact_severity | STRING | Severity of the impact based on the impact column value (ref.impact category) |
| polyphen_pred | STRING | Impact of the SNP as given by PolyPhen (VEP only)
benign, possibly_damaging, probably_damaging, unknown
|
| polyphen_scores | FLOAT | Polyphen score reflecting severity (higher the impact, higher the score) |
| sift_pred | STRING | Impact of the SNP as given by SIFT (VEP only)
neutral, deleterious
|
| sift_scores | FLOAT | SIFT prob. scores reflecting severity (Higher the impact, lower the score) |
| column name | type | notes |
|---|---|---|
| sample_id | INTEGER | PRIMARY_KEY |
| name | STRING | Sample names |
| family_id | INTEGER | Family ids for the samples [User defined, default: NULL] |
| paternal_id | INTEGER | Paternal id for the samples [User defined, default: NULL] |
| maternal_id | INTEGER | Maternal id for the samples [User defined, default: NULL] |
| sex | STRING | Sex of the sample [User defined, default: NULL] |
| phenotype | STRING | The associated sample phenotype [User defined, default: NULL] |
| ethnicity | STRING | The ethnic group to which the sample belongs [User defined, default: NULL] |