Dataset Information

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data.

ABSTRACT:

Motivation

Reliable estimation of the mean fragment length for next-generation short-read sequencing data is an important step in next-generation sequencing analysis pipelines, most notably because of its impact on the accuracy of the enriched regions identified by peak-calling algorithms. Although many peak-calling algorithms include a fragment-length estimation subroutine, the problem has not been adequately solved, as demonstrated by the variability of the estimates returned by different algorithms.

Results

In this article, we investigate the use of strand cross-correlation to estimate mean fragment length of single-end data and show that traditional estimation approaches have mixed reliability. We observe that the mappability of different parts of the genome can introduce an artificial bias into cross-correlation computations, resulting in incorrect fragment-length estimates. We propose a new approach, called mappability-sensitive cross-correlation (MaSC), which removes this bias and allows for accurate and reliable fragment-length estimation. We analyze the computational complexity of this approach, and evaluate its performance on a test suite of NGS datasets, demonstrating its superiority to traditional cross-correlation analysis.

Availability

An open-source Perl implementation of our approach is available at http://www.perkinslab.ca/Software.html.

SUBMITTER: Ramachandran P

PROVIDER: S-EPMC3570216 | biostudies-literature | 2013 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data.

Ramachandran Parameswaran P Palidwor Gareth A GA Porter Christopher J CJ Perkins Theodore J TJ

Bioinformatics (Oxford, England) 20130107 4

<h4>Motivation</h4>Reliable estimation of the mean fragment length for next-generation short-read sequencing data is an important step in next-generation sequencing analysis pipelines, most notably because of its impact on the accuracy of the enriched regions identified by peak-calling algorithms. Although many peak-calling algorithms include a fragment-length estimation subroutine, the problem has not been adequately solved, as demonstrated by the variability of the estimates returned by differ ...[more]

PMID: 23300135

Dataset Information

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data.

Motivation

Results

Availability

Publications

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A sensitive short read homology search tool for paired-end read sequencing data.
| S-EPMC5657049 | biostudies-literature

Estimating mean potential outcome under adaptive treatment length strategies in continuous time.
| S-EPMC9482146 | biostudies-literature

Estimating abundances of retroviral insertion sites from DNA fragment length data.
| S-EPMC3307109 | biostudies-literature

Paired-end mappability of transposable elements in the human genome.
| S-EPMC6617613 | biostudies-literature

Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.
| S-EPMC3413383 | biostudies-literature

Metagenomics: read length matters.
| S-EPMC2258652 | biostudies-literature

Sensitive PCR-restriction fragment length polymorphism assay for detection and genotyping of Giardia duodenalis in human feces.
| S-EPMC153413 | biostudies-literature

ESTIMATING MEAN SURVIVAL TIME: WHEN IS IT POSSIBLE?
| S-EPMC4442028 | biostudies-literature

Critical length in long-read resequencing.
| S-EPMC7671308 | biostudies-literature

SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability.
| S-EPMC4896370 | biostudies-literature