Unknown

Dataset Information

0

Using secondary structure to predict the effects of genetic variants on alternative splicing.


ABSTRACT: Accurate interpretation of genomic variants that alter RNA splicing is critical to precision medicine. We present a computational framework, Prediction of variant Effect on Percent Spliced In (PEPSI), that predicts the splicing impact of coding and noncoding variants for the Fifth Critical Assessment of Genome Interpretation (CAGI5) "Vex-seq" challenge. PEPSI is a random forest regression model trained on multiple layers of features associated with sequence conservation and regulatory sequence elements. Compared to other splicing defect prediction tools from the literature, our framework integrates secondary structure information in predicting variants that disrupt splicing regulatory elements (SREs). We applied our model to classify splice-disrupting variants among 2,094 single-nucleotide polymorphisms from the Exome Aggregation Consortium using model-predicted changes in percent spliced in (?PSI) associated with tested variants. Benchmarking our model against widely used state-of-the-art tools, we demonstrate that PEPSI achieves comparable performance in terms of sensitivity and precision. Moreover, we also show that using secondary structure context can help resolve several cases where changes in the counts of SREs do not correspond with the directionality of ?PSI measured for tested variants.

SUBMITTER: Wang R 

PROVIDER: S-EPMC7288985 | biostudies-literature | 2019 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Using secondary structure to predict the effects of genetic variants on alternative splicing.

Wang Robert R   Wang Yaqiong Y   Hu Zhiqiang Z  

Human mutation 20190618 9


Accurate interpretation of genomic variants that alter RNA splicing is critical to precision medicine. We present a computational framework, Prediction of variant Effect on Percent Spliced In (PEPSI), that predicts the splicing impact of coding and noncoding variants for the Fifth Critical Assessment of Genome Interpretation (CAGI5) "Vex-seq" challenge. PEPSI is a random forest regression model trained on multiple layers of features associated with sequence conservation and regulatory sequence e  ...[more]

Similar Datasets

| S-EPMC4143934 | biostudies-literature
| S-EPMC3929938 | biostudies-literature
| S-EPMC7919445 | biostudies-literature
| S-EPMC5769772 | biostudies-literature
| S-EPMC8052246 | biostudies-literature
| S-EPMC311065 | biostudies-literature
2018-01-22 | GSE107542 | GEO
| S-EPMC3082633 | biostudies-literature
| S-EPMC8011109 | biostudies-literature
| S-EPMC5457519 | biostudies-literature