Dataset Information

Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi, and land plants.

ABSTRACT: Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used "1-nearest-neighbor" (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research.

SUBMITTER: Tanabe AS

PROVIDER: S-EPMC3799923 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi, and land plants.

Tanabe Akifumi S AS Toju Hirokazu H

PloS one 20131018 10

Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark f ...[more]

PMID: 24204702

Dataset Information

Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi, and land plants.

Publications

Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi, and land plants.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Use of ITS2 region as the universal DNA barcode for plants and animals.
| S-EPMC2948509 | biostudies-literature

A DNA barcode for land plants.
| S-EPMC2722355 | biostudies-literature

ycf1, the most promising plastid DNA barcode of land plants.
| S-EPMC4325322 | biostudies-literature

Evolution of protein indels in plants, animals and fungi.
| S-EPMC3706215 | biostudies-literature

Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi.
| S-EPMC3341068 | biostudies-literature

Universal model for water costs of gas exchange by animals and plants.
| S-EPMC2889562 | biostudies-literature

Phylogenetic detection of numerous gene duplications shared by animals, fungi and plants.
| S-EPMC2884541 | biostudies-literature

Biosynthesis of long chain base in sphingolipids in animals, plants and fungi.
| S-EPMC6920741 | biostudies-literature

Sequential loss of dynein sequences precedes complete loss in land plants.
| S-EPMC9237703 | biostudies-literature