Unknown

Dataset Information

0

RAPSearch: a fast protein similarity search tool for short reads.


ABSTRACT: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets.We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST.RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

SUBMITTER: Ye Y 

PROVIDER: S-EPMC3113943 | biostudies-literature | 2011 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

RAPSearch: a fast protein similarity search tool for short reads.

Ye Yuzhen Y   Choi Jeong-Hyeon JH   Tang Haixu H  

BMC bioinformatics 20110515


<h4>Background</h4>Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets.<h4>Results</h4>We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alpha  ...[more]

Similar Datasets

| S-EPMC3035798 | biostudies-literature
| S-EPMC1522020 | biostudies-literature
| S-EPMC3244761 | biostudies-literature
| S-EPMC3591303 | biostudies-literature
| S-EPMC3527383 | biostudies-literature
| S-EPMC1421445 | biostudies-literature
| S-EPMC3125810 | biostudies-literature
| S-EPMC5870704 | biostudies-literature
| S-EPMC2951093 | biostudies-literature
| S-EPMC3198573 | biostudies-literature