Unknown

Dataset Information

0

Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem.


ABSTRACT: Background:Transcriptomic structural variants (TSVs)-large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. Results:To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangements Problem (MCAP), which seeks k genome arrangements that maximize the number of reads that are concordant with at least one arrangement. This models a heterogeneous or diploid sample. We prove that MCAP is NP-complete and provide a 14 -approximation algorithm for k=1 and a 34 -approximation algorithm for the diploid case ( k=2 ) assuming an oracle for k=1 . Combining these, we obtain a 316 -approximation algorithm for MCAP when k=2 (without an oracle). We also present an integer linear programming formulation for general k. We characterize the conflict structures in the graph that require k>1 alleles to satisfy read concordancy and show that such structures are prevalent. Conclusions:We show that the solution to MCAP accurately addresses sample heterogeneity during TSV detection. Our algorithms have improved performance on TCGA cancer samples and cancer cell line samples compared to a TSV calling tool, SQUID. The software is available at https://github.com/Kingsford-Group/diploidsquid.

SUBMITTER: Qiu Y 

PROVIDER: S-EPMC7227063 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem.

Qiu Yutong Y   Ma Cong C   Xie Han H   Kingsford Carl C  

Algorithms for molecular biology : AMB 20200515


<h4>Background</h4>Transcriptomic structural variants (TSVs)-large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction.<h4>Results</h4>To improve TSV detection in heterogeneous RNA-seq samples, we introduce  ...[more]

Similar Datasets

| S-EPMC4263227 | biostudies-literature
| S-EPMC4384290 | biostudies-literature
| S-EPMC5773919 | biostudies-literature
| S-EPMC9202106 | biostudies-literature
| S-EPMC10682169 | biostudies-literature
| S-EPMC7400572 | biostudies-literature
| S-EPMC7865037 | biostudies-literature
| S-EPMC8931943 | biostudies-literature
2021-05-17 | PXD025934 | Pride
| S-EPMC7774871 | biostudies-literature