Unknown

Dataset Information

0

Statistical algorithms improve accuracy of gene fusion detection.


ABSTRACT: Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show that the best performing published algorithms either find large numbers of fusions in negative control data or suffer from low sensitivity detecting known driving fusions in gold standard settings, such as EWSR1-FLI1. As proof of principle that MACHETE discovers novel gene fusions with high accuracy in vivo, we mined public data to discover and subsequently PCR validate novel gene fusions missed by other algorithms in the ovarian cancer cell line OVCAR3. These results highlight the gains in accuracy achieved by introducing statistical models into fusion detection, and pave the way for unbiased discovery of potentially driving and druggable gene fusions in primary tumors.

SUBMITTER: Hsieh G 

PROVIDER: S-EPMC5737606 | biostudies-literature | 2017 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Statistical algorithms improve accuracy of gene fusion detection.

Hsieh Gillian G   Bierman Rob R   Szabo Linda L   Lee Alex Gia AG   Freeman Donald E DE   Watson Nathaniel N   Sweet-Cordero E Alejandro EA   Salzman Julia J  

Nucleic acids research 20170701 13


Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show  ...[more]

Similar Datasets

| S-EPMC4344485 | biostudies-literature
| S-EPMC6736430 | biostudies-literature
| S-EPMC2703893 | biostudies-literature
| S-EPMC7612324 | biostudies-literature
| S-EPMC3242814 | biostudies-literature
| S-EPMC9106676 | biostudies-literature
| S-EPMC6802306 | biostudies-literature
2023-05-23 | GSE230475 | GEO
| S-EPMC4981474 | biostudies-literature
| S-EPMC6402142 | biostudies-literature