Unknown

Dataset Information

0

OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data.


ABSTRACT: RNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we describe OUTRIDER (Outlier in RNA-Seq Finder), an algorithm developed to address these issues. The algorithm uses an autoencoder to model read-count expectations according to the gene covariation resulting from technical, environmental, or common genetic variations. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. The model is automatically fitted to achieve the best recall of artificially corrupted data. Precision-recall analyses using simulated outlier read counts demonstrated the importance of controlling for covariation and significance-based thresholds. OUTRIDER is open source and includes functions for filtering out genes not expressed in a dataset, for identifying outlier samples with too many aberrantly expressed genes, and for detecting aberrant gene expression on the basis of false-discovery-rate-adjusted p values. Overall, OUTRIDER provides an end-to-end solution for identifying aberrantly expressed genes and is suitable for use by rare-disease diagnostic platforms.

SUBMITTER: Brechtmann F 

PROVIDER: S-EPMC6288422 | biostudies-literature | 2018 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data.

Brechtmann Felix F   Mertes Christian C   Matusevičiūtė Agnė A   Yépez Vicente A VA   Avsec Žiga Ž   Herzog Maximilian M   Bader Daniel M DM   Prokisch Holger H   Gagneur Julien J  

American journal of human genetics 20181129 6


RNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we de  ...[more]

Similar Datasets

| S-EPMC5151178 | biostudies-literature
| S-EPMC2464587 | biostudies-literature
| S-EPMC6284200 | biostudies-literature
| S-EPMC7212782 | biostudies-literature
| S-EPMC5617423 | biostudies-literature
| S-EPMC7325161 | biostudies-literature
| S-EPMC5876391 | biostudies-literature
2010-08-25 | GSE23785 | GEO
| S-EPMC7186670 | biostudies-literature
| S-EPMC7514722 | biostudies-literature