Unknown

Dataset Information

0

Using quality scores and longer reads improves accuracy of Solexa read mapping.


ABSTRACT:

Background

Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from approximately 25-50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores.

Results

To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at http://rulai.cshl.edu/rmap/.

Conclusion

Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects.

SUBMITTER: Smith AD 

PROVIDER: S-EPMC2335322 | biostudies-literature | 2008 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Using quality scores and longer reads improves accuracy of Solexa read mapping.

Smith Andrew D AD   Xuan Zhenyu Z   Zhang Michael Q MQ  

BMC bioinformatics 20080228


<h4>Background</h4>Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis t  ...[more]

Similar Datasets

| S-EPMC2577856 | biostudies-literature
| S-EPMC2853142 | biostudies-literature
2017-07-30 | GSE101815 | GEO
| S-EPMC4816028 | biostudies-literature
| S-EPMC2847217 | biostudies-literature
| S-EPMC6198854 | biostudies-other
| S-EPMC6220446 | biostudies-literature
| S-EPMC7448240 | biostudies-literature