Dataset Information

Sequana coverage: detection and characterization of genomic variations using running median and mixture models.

ABSTRACT: Background:In addition to mapping quality information, the Genome coverage contains valuable biological information such as the presence of repetitive regions, deleted genes, or copy number variations (CNVs). It is essential to take into consideration atypical regions, trends (e.g., origin of replication), or known and unknown biases that influence coverage. It is also important that reported events have robust statistics (e.g. z-score) associated with their detections as well as precise location. Results:We provide a stand-alone application, sequana_coverage, that reports genomic regions of interest (ROIs) that are significantly over- or underrepresented in high-throughput sequencing data. Significance is associated with the events as well as characteristics such as length of the regions. The algorithm first detrends the data using an efficient running median algorithm. It then estimates the distribution of the normalized genome coverage with a Gaussian mixture model. Finally, a z-score statistic is assigned to each base position and used to separate the central distribution from the ROIs (i.e., under- and overcovered regions). A double thresholds mechanism is used to cluster the genomic ROIs. HTML reports provide a summary with interactive visual representations of the genomic ROIs with standard plots and metrics. Genomic variations such as single-nucleotide variants or CNVs can be effectively identified at the same time.

SUBMITTER: Desvillechabrol D

PROVIDER: S-EPMC6275460 | biostudies-literature | 2018 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sequana coverage: detection and characterization of genomic variations using running median and mixture models.

Desvillechabrol Dimitri D Bouchier Christiane C Kennedy Sean S Cokelaer Thomas T

GigaScience 20181201 12

<h4>Background</h4>In addition to mapping quality information, the Genome coverage contains valuable biological information such as the presence of repetitive regions, deleted genes, or copy number variations (CNVs). It is essential to take into consideration atypical regions, trends (e.g., origin of replication), or known and unknown biases that influence coverage. It is also important that reported events have robust statistics (e.g. z-score) associated with their detections as well as precise ...[more]

PMID: 30192951

Dataset Information

Sequana coverage: detection and characterization of genomic variations using running median and mixture models.

Publications

Sequana coverage: detection and characterization of genomic variations using running median and mixture models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Detection of copy number variations in rice using array-based comparative genomic hybridization
2012-07-01 | GSE30542 | GEO

Mixture models for distance sampling detection functions.
| S-EPMC4368789 | biostudies-literature

Detecting disease-associated genomic outcomes using constrained mixture of Bayesian hierarchical models for paired data.
| S-EPMC5373614 | biostudies-literature

Detection of copy number variations in rice using array-based comparative genomic hybridization
2012-06-30 | E-GEOD-30542 | biostudies-arrayexpress

Detection of copy number variations in rice using array-based comparative genomic hybridization.
| S-EPMC3156786 | biostudies-literature

Semiparametric Bayesian survival analysis using models with log-linear median.
| S-EPMC5557061 | biostudies-other

Analyzing allele specific RNA expression using mixture models.
| S-EPMC4521363 | biostudies-literature

Detection and characterization of horizontal transfers in prokaryotes using genomic signature.
| S-EPMC546175 | biostudies-literature

Genomic prediction using low-coverage portable Nanopore sequencing.
| S-EPMC8673642 | biostudies-literature

Estimating Lion Abundance using N-mixture Models for Social Species.
| S-EPMC5082374 | biostudies-other