Dataset Information

Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel).

ABSTRACT: Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method.

SUBMITTER: Douville C

PROVIDER: S-EPMC5057310 | biostudies-literature | 2016 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel).

Douville Christopher C Masica David L DL Stenson Peter D PD Cooper David N DN Gygax Derek M DM Kim Rick R Ryan Michael M Karchin Rachel R

Human mutation 20151026 1

Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities ...[more]

PMID: 26442818

Similar Datasets

Project description:BackgroundIn a previous study, we demonstrated that some essential proteins from pathogenic organisms contained sizable insertions/deletions (indels) when aligned to human proteins of high sequence similarity. Such indels may provide sufficient spatial differences between the pathogenic protein and human proteins to allow for selective targeting. In one example, an indel difference was targeted via large scale in-silico screening. This resulted in selective antibodies and small compounds which were capable of binding to the deletion-bearing essential pathogen protein without any cross-reactivity to the highly similar human protein. The objective of the current study was to investigate whether indels were found more frequently in essential than non-essential proteins.ResultsWe have investigated three species, Bacillus subtilis, Escherichia coli, and Saccharomyces cerevisiae, for which high-quality protein essentiality data is available. Using these data, we demonstrated with t-test calculations that the mean indel frequencies in essential proteins were greater than that of non-essential proteins in the three proteomes. The abundance of indels in both types of proteins was also shown to be accurately modeled by the Weibull distribution. However, Receiver Operator Characteristic (ROC) curves showed that indel frequencies alone could not be used as a marker to accurately discriminate between essential and non-essential proteins in the three proteomes. Finally, we analyzed the protein interaction data available for S. cerevisiae and observed that indel-bearing proteins were involved in more interactions and had greater betweenness values within Protein Interaction Networks (PINs).ConclusionOverall, our findings demonstrated that indels were not randomly distributed across the studied proteomes and were likely to occur more often in essential proteins and those that were highly connected, indicating a possible role of sequence insertions and deletions in the regulation and modification of protein-protein interactions. Such observations will provide new insights into indel-based drug design using bioinformatics and cheminformatics tools.

Project description:We developed 21,499 genome-wide insertion-deletion (InDel) markers (2- to 54-bp in silico fragment length polymorphism) by comparing the genomic sequences of four (desi, kabuli and wild C. reticulatum) chickpea [Cicer arietinum (L.)] accessions. InDel markers showing 2- to 6-bp fragment length polymorphism among accessions were abundant (76.8%) in the chickpea genome. The physically mapped 7,643 and 13,856 markers on eight chromosomes and unanchored scaffolds, respectively, were structurally and functionally annotated. The 4,506 coding (23% large-effect frameshift mutations) and regulatory InDel markers were identified from 3,228 genes (representing 11.7% of total 27,571 desi genes), suggesting their functional relevance for trait association/genetic mapping. High amplification (97%) and intra-specific polymorphic (60-83%) potential and wider genetic diversity (15-89%) were detected by genome-wide 6,254 InDel markers among desi, kabuli and wild accessions using even a simpler cost-effective agarose gel-based assay. This signifies added advantages of this user-friendly genetic marker system for manifold large-scale genotyping applications in laboratories with limited infrastructure and resources. Utilizing 6,254 InDel markers-based high-density (inter-marker distance: 0.212 cM) inter-specific genetic linkage map (ICC 4958 × ICC 17160) of chickpea as a reference, three major genomic regions harboring six flowering and maturity time robust QTLs (16.4-27.5% phenotypic variation explained, 8.1-11.5 logarithm of odds) were identified. Integration of genetic and physical maps at these target QTL intervals mapped on three chromosomes delineated five InDel markers-containing candidate genes tightly linked to the QTLs governing flowering and maturity time in chickpea. Taken together, our study demonstrated the practical utility of developing and high-throughput genotyping of such beneficial InDel markers at a genome-wide scale to expedite genomics-assisted breeding applications in chickpea.

Project description:BackgroundClinical laboratories implement a variety of measures to classify somatic sequence variants and identify clinically significant variants to facilitate the implementation of precision medicine. To standardize the interpretation process, the Association for Molecular Pathology (AMP), American Society of Clinical Oncology (ASCO), and College of American Pathologists (CAP) published guidelines for the interpretation and reporting of sequence variants in cancer in 2017. These guidelines classify somatic variants using a four-tiered system with ten criteria. Even with the standardized guidelines, assessing clinical impacts of somatic variants remains to be tedious. Additionally, manual implementation of the guidelines may vary among professionals and may lack reproducibility when the supporting evidence is not documented in a consistent manner.ResultsWe developed a semi-automated tool called "Variant Interpretation for Cancer" (VIC) to accelerate the interpretation process and minimize individual biases. VIC takes pre-annotated files and automatically classifies sequence variants based on several criteria, with the ability for users to integrate additional evidence to optimize the interpretation on clinical impacts. We evaluated VIC using several publicly available databases and compared with several predictive software programs. We found that VIC is time-efficient and conservative in classifying somatic variants under default settings, especially for variants with strong and/or potential clinical significance. Additionally, we also tested VIC on two cancer-panel sequencing datasets to show its effectiveness in facilitating manual interpretation of somatic variants.ConclusionsAlthough VIC cannot replace human reviewers, it will accelerate the interpretation process on somatic variants. VIC can also be customized by clinical laboratories to fit into their analytical pipelines to facilitate the laborious process of somatic variant interpretation. VIC is freely available at https://github.com/HGLab/VIC/ .

Dataset Information

Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel).

Publications

Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel).

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets