Unknown

Dataset Information

0

A new efficient method for analyzing fungi species using correlations between nucleotides.


ABSTRACT:

Background

In recent years, DNA barcoding has become an important tool for biologists to identify species and understand their natural biodiversity. The complexity of barcode data makes it difficult to analyze quickly and effectively. Manual classification of this data cannot keep up to the rate of increase of available data.

Results

In this study, we propose a new method for DNA barcode classification based on the distribution of nucleotides within the sequence. By adding the covariance of nucleotides to the original natural vector, this augmented 18-dimensional natural vector makes good use of the available information in the DNA sequence. The accurate classification results we obtained demonstrate that this new 18-dimensional natural vector method, together with the random forest classifier algorthm, can serve as a computationally efficient identification tool for DNA barcodes. We performed phylogenetic analysis on the genus Megacollybia to validate our method. We also studied how effective our method was in determining the genetic distance within and between species in our barcoding dataset.

Conclusions

The classification performs well on the fungi barcode dataset with high and robust accuracy. The reasonable phylogenetic trees we obtained further validate our methods. This method is alignment-free and does not depend on any model assumption, and it will become a powerful tool for classification and evolutionary analysis.

SUBMITTER: Zhao X 

PROVIDER: S-EPMC6307163 | biostudies-literature | 2018 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

A new efficient method for analyzing fungi species using correlations between nucleotides.

Zhao Xin X   Tian Kun K   Yau Stephen S-T SS  

BMC evolutionary biology 20181227 1


<h4>Background</h4>In recent years, DNA barcoding has become an important tool for biologists to identify species and understand their natural biodiversity. The complexity of barcode data makes it difficult to analyze quickly and effectively. Manual classification of this data cannot keep up to the rate of increase of available data.<h4>Results</h4>In this study, we propose a new method for DNA barcode classification based on the distribution of nucleotides within the sequence. By adding the cov  ...[more]

Similar Datasets

| S-EPMC4311241 | biostudies-other
| S-EPMC3905948 | biostudies-literature
| S-EPMC3583807 | biostudies-other
| S-EPMC6585625 | biostudies-literature
2014-01-17 | E-GEOD-54179 | biostudies-arrayexpress
| S-EPMC4989219 | biostudies-literature
| S-EPMC7029891 | biostudies-literature
2014-01-17 | GSE54179 | GEO
| S-EPMC7524000 | biostudies-literature