In our efforts to conduct wide human genome analysis, our researchers use a number of powerful bioinformatics tools and programs to annotate, visualize and interpret next generation sequence data. Links to these tools and publically available genetic variant catalogs can be found here:
Genetic Variant Catalogs
The ALS Data Browser is a catalog of genetic variants identified from 1,424 Caucasian patients recruited and sequenced for their diagnosis of Amyotrophic Lateral Sclerosis. Approximately 93.5% of these cases are sporadic. The database includes single nucleotide substitution variants (SNVs) and insertion and deletion (indels) variants. Funding for this study was provided by Biogen Idec.
The Epi4K Data Browser is a catalog of genetic variants identified from 337 probands ascertained and sequenced for their diagnosis of Epileptic Encephalopathy. The database includes single nucleotide substitution variants (SNVs) and insertion and deletion (indels) variants.
Bioinformatics Programs & Tools
ATAV (Analysis Tool for Annotated Variants) is a statistical toolset that is designed to detect complex disease-associated rare genetic variants by performing association analysis on annotated variants derived from whole-genome or whole-exome sequencing data.
DNMFilter is a machine learning based tool designed to filter out false positive de novo mutations (DNMs) obtained by any computational or manual approaches from next generation sequencing data. It can be used as either a stand-alone tool to detect DNM or coupled with other commonly used DNM detection tool (GATK UnifiledGenotyper, polymutt, DenovoGear et al.) to improve specificity.
In assessing variants for their role in particular phenotypes, there is often a need to view the location of variants across the gene and within sequences that encode particular protein domains. DV-auto (.zip file) is a convenient command line UNIX program for such a purpose. It has several unique features: 1) It automatically retrieves coding DNA sequences, translates them and annotates domains; 2) It only needs the genomic coordinates of variants as input, and it maps them onto the proteins and classifies them into functional categories such as missense, stop-gain, and splice-change; 3) It provides options to filter variants based on their allele frequencies; 4) It does a statistical test to determine whether variants are distributed randomly in a given protein; and 5) It generates figures that can highlight differences between variants, for example by function, case versus control status, and associated diseases.
DV-auto is a command line UNIX program written in Perl. To install DV-auto, please download the package and unzip it, then run DV-install.pl with all required parameters. You need download several additional files from other websites, which may be the most time consuming part. It is noteworthy that the current version of DV-auto only works for human genes and variants.
ERDS is designed for detection of copy number variants (CNVs) on human genomes from next generation sequence data, utilizing information from read depth of short reads and SNV heterozygosity.
GenePattern provides access to a broad array of computational methods used to analyze genomic data. Its extendable architecture makes it easy for computational biologists to add analysis and visualization modules. This ensures that GenePattern users have access to a continuously growing repository of new computational methods.
GWASpower/QT is a statistical power calculation software designed for genome wide association studies (GWAS) with quantitative traits in natural populations. It allows users to input the effect size as heritability measures, instead of the phenotype means of each genotype of the genetic marker, which is often unavailable in exploratory experiments such as GWAS. Input parameters are heritability (required), type 1 error rate (required), total sample size (required), linkage disequilibrium (optional) and other covariates (optional). The software returns the statistical power and a plot of a family of power curves. Documentation.
MetaP performs a meta-analysis and combines the statistical association signals (P values) from independent studies or study populations, taking account of the impacts of sample sizes and effect directions.
RVIS (Genic Intolerance)
RVIS (Residual Variation Intolerance Score) is a gene-based score intended to help in the interpretation of human sequence data. The intolerance score in its current form is based upon allele frequency data as represented in whole exome sequence data from the NHLBI-ESP6500 data set. The score is designed to rank genes in terms of whether they have more or less common functional genetic variation relative to the genome wide expectation given the amount of apparently neutral variation the gene has. A gene with a positive score has more common functional variation, and a gene with a negative score has less and is referred to as "intolerant". By convention we rank all genes in order from most intolerant to least. As an example, a gene such as ATP1A3 has a RVIS score of -1.53 and a percentile of 3.37%, meaning it is amongst the 3.37% most intolerant of human genes. Depending on what disease area you are a studying, you may way to consider either intolerant genes (neurodevelomental disease) or tolerant genes (some immunological diseases) as better candidates.
SNPExpress is a database interface that we developed to permit interrogation of the effects of common SNPs on exon and transcript level expression. This database enables researchers to input a SNP, gene, or a genomic region to investigate regions of interest for localized effects of SNPs on exon and gene level expression changes.
Given an alignment file of NGS reads and a user-defined target region in the genome, SV-analyzer will: 1) generate figures highlighting abnormally mapped reads for users to visually check SVs; 2) automatically retrieve reads mapped in the target region together with their paired-reads; 3) perform a local assembly using the retrieved reads; 4) annotate repetitive sequences within the assembled contigs; 5) perform a pairwise sequence comparison between the contigs and the reference genomic sequence, and locate overlapped genes. The results above are also presented in figures.
WGAViewer is a bioinformatic software tool specifically designed to provide a user-friendly interface to automatically annotate, visualize, and interpret the set of P values emerging from a GWAS study. It can be used to highlight possible functional mechanisms, to help create working hypothesis, and to select genomic regions that may need to be resequenced in a search for candidate causal variants.