Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem.
Ontology highlight
ABSTRACT: Background:Transcriptomic structural variants (TSVs)-large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. Results:To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangements Problem (MCAP), which seeks k genome arrangements that maximize the number of reads that are concordant with at least one arrangement. This models a heterogeneous or diploid sample. We prove that MCAP is NP-complete and provide a 14 -approximation algorithm for k=1 and a 34 -approximation algorithm for the diploid case ( k=2 ) assuming an oracle for k=1 . Combining these, we obtain a 316 -approximation algorithm for MCAP when k=2 (without an oracle). We also present an integer linear programming formulation for general k. We characterize the conflict structures in the graph that require k>1 alleles to satisfy read concordancy and show that such structures are prevalent. Conclusions:We show that the solution to MCAP accurately addresses sample heterogeneity during TSV detection. Our algorithms have improved performance on TCGA cancer samples and cancer cell line samples compared to a TSV calling tool, SQUID. The software is available at https://github.com/Kingsford-Group/diploidsquid.
SUBMITTER: Qiu Y
PROVIDER: S-EPMC7227063 | biostudies-literature | 2020
REPOSITORIES: biostudies-literature
ACCESS DATA