Unknown

Dataset Information

0

Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies.


ABSTRACT: In the course of analyzing 9,522,746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Center for Biotechnology Information (NCBI) non-redundant protein database approaches 90%. One conserved portion of 23S rRNA was consistently misannotated often enough to prompt curators at Pfam to create a spurious protein family. Detailed examination of the annotation history of each seed sequence in the spurious Pfam protein family (PF10695, 'Cw-hydrolase') uncovered issues in the standard operating procedures and quality assurance programs of major sequencing centers, and other issues relating to the curation practices of those managing public databases such as GenBank and SwissProt. We offer recommendations for all these issues, and recommend as well that workers in the field of metatranscriptomics take extra care to avoid including false positive matches in their datasets.

SUBMITTER: Tripp HJ 

PROVIDER: S-EPMC3203614 | biostudies-literature | 2011 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies.

Tripp H James HJ   Hewson Ian I   Boyarsky Sam S   Stuart Joshua M JM   Zehr Jonathan P JP  

Nucleic acids research 20110719 20


In the course of analyzing 9,522,746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Center for Biotechnology Information (NCBI) non-redundant protein database approaches 90%. One conserved portion of 23S rRNA was consistently misannotated often enough to prompt curators at Pfam to create  ...[more]

Similar Datasets

| S-EPMC6662999 | biostudies-literature
| S-EPMC6695163 | biostudies-literature
| S-EPMC4545352 | biostudies-literature
2010-07-19 | GSE21687 | GEO
| S-EPMC1941744 | biostudies-literature
| S-EPMC5609739 | biostudies-literature
| S-EPMC4336944 | biostudies-literature
| S-EPMC2638943 | biostudies-literature
| S-EPMC7713993 | biostudies-literature
| S-EPMC9835706 | biostudies-literature