Unknown

Dataset Information

0

A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.


ABSTRACT: Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.

SUBMITTER: Zhang AB 

PROVIDER: S-EPMC3282726 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

Zhang Ai-bing AB   Feng Jie J   Ward Robert D RD   Wan Ping P   Gao Qiang Q   Wu Jun J   Zhao Wei-zhong WZ  

PloS one 20120220 2


Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (D  ...[more]

Similar Datasets

| S-EPMC2775153 | biostudies-literature
| S-EPMC6158771 | biostudies-other
| S-EPMC3566972 | biostudies-literature
| S-EPMC3670917 | biostudies-literature
| S-EPMC8748993 | biostudies-literature
| S-EPMC2777894 | biostudies-literature
| S-EPMC2990459 | biostudies-literature
| S-EPMC5648192 | biostudies-literature
| S-EPMC8142502 | biostudies-literature
2022-10-20 | PXD022225 | Pride