Unknown

Dataset Information

0

A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator.


ABSTRACT: The Cytochrome C Oxidase subunit I gene ("COI") is the de facto standard for animal DNA barcoding. Organism identification based on COI requires an accurate and extensive annotated database of COI sequences. Such a database can also be of value in reconstructing evolutionary history and in diversity studies. Two COI databases are currently available: BOLD and Midori. BOLD's submissions conform to stringent sequence and metadata requirements; BOLD is specific to COI but makes no attempt to be comprehensive. Midori, derived from GenBank, has more sequences but less stringent standards than BOLD, resulting in higher error rates. To address the need for a comprehensive and accurate COI database, we adapted the ARBitrator algorithm, which classifies based only on sequence properties and has successfully auto-curated bacterial genes mined from GenBank. The adapted algorithm, which we call CO-ARBitrator, built a database of over a million metazoan COI sequences. Sensitivity and specificity are significantly higher than Midori. Specificity is comparable to what BOLD achieves with data quality prerequisites. Results and software are publicly available.

SUBMITTER: Heller P 

PROVIDER: S-EPMC6080493 | biostudies-literature | 2018 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator.

Heller Philip P   Casaletto James J   Ruiz Gregory G   Geller Jonathan J  

Scientific data 20180807


The Cytochrome C Oxidase subunit I gene ("COI") is the de facto standard for animal DNA barcoding. Organism identification based on COI requires an accurate and extensive annotated database of COI sequences. Such a database can also be of value in reconstructing evolutionary history and in diversity studies. Two COI databases are currently available: BOLD and Midori. BOLD's submissions conform to stringent sequence and metadata requirements; BOLD is specific to COI but makes no attempt to be com  ...[more]

Similar Datasets

| S-EPMC9216479 | biostudies-literature
| PRJEB70388 | ENA
| S-EPMC7426930 | biostudies-literature
| S-EPMC6327197 | biostudies-literature
| PRJNA949520 | ENA
| PRJNA949566 | ENA
| S-EPMC2241769 | biostudies-literature
| S-EPMC2910218 | biostudies-literature
| S-EPMC124958 | biostudies-literature
| S-EPMC3424232 | biostudies-literature