Unknown

Dataset Information

0

BM-map: Bayesian mapping of multireads for next-generation sequencing data.


ABSTRACT: Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software.

SUBMITTER: Ji Y 

PROVIDER: S-EPMC3190637 | biostudies-literature | 2011 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

BM-map: Bayesian mapping of multireads for next-generation sequencing data.

Ji Yuan Y   Xu Yanxun Y   Zhang Qiong Q   Tsui Kam-Wah KW   Yuan Yuan Y   Norris Clift C   Liang Shoudan S   Liang Han H  

Biometrics 20110422 4


Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads.  ...[more]

Similar Datasets

| S-EPMC2896157 | biostudies-literature
| S-EPMC3212839 | biostudies-other
| S-EPMC4697941 | biostudies-literature
| S-EPMC5080835 | biostudies-literature
| S-EPMC3356304 | biostudies-literature
| S-EPMC9891242 | biostudies-literature
| S-EPMC2493403 | biostudies-literature
| S-EPMC3694114 | biostudies-literature
| S-EPMC7822856 | biostudies-literature
| S-EPMC6288940 | biostudies-literature