Unknown

Dataset Information

0

Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer.


ABSTRACT: Horizontal gene transfer (HGT) plays an important role in the evolution of microbial organisms including bacteria. Alignment-free methods based on single genome compositional information have been used to detect HGT. Currently, Manhattan and Euclidean distances based on tetranucleotide frequencies are the most commonly used alignment-free dissimilarity measures to detect HGT. By testing on simulated bacterial sequences and real data sets with known horizontal transferred genomic regions, we found that more advanced alignment-free dissimilarity measures such as CVTree and [Formula: see text] that take into account the background Markov sequences can solve HGT detection problems with significantly improved performance. We also studied the influence of different factors such as evolutionary distance between host and donor sequences, size of sliding window, and host genome composition on the performances of alignment-free methods to detect HGT. Our study showed that alignment-free methods can predict HGT accurately when host and donor genomes are in different order levels. Among all methods, CVTree with word length of 3, [Formula: see text] with word length 3, Markov order 1 and [Formula: see text] with word length 4, Markov order 1 outperform others in terms of their highest F1-score and their robustness under the influence of different factors.

SUBMITTER: Tang K 

PROVIDER: S-EPMC5911508 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer.

Tang Kujin K   Lu Yang Young YY   Sun Fengzhu F  

Frontiers in microbiology 20180416


Horizontal gene transfer (HGT) plays an important role in the evolution of microbial organisms including bacteria. Alignment-free methods based on single genome compositional information have been used to detect HGT. Currently, Manhattan and Euclidean distances based on tetranucleotide frequencies are the most commonly used alignment-free dissimilarity measures to detect HGT. By testing on simulated bacterial sequences and real data sets with known horizontal transferred genomic regions, we foun  ...[more]

Similar Datasets

| S-EPMC4918981 | biostudies-literature
| S-EPMC3014950 | biostudies-literature
| S-EPMC4958984 | biostudies-literature
2022-11-21 | GSE208001 | GEO
| S-EPMC8273350 | biostudies-literature
| S-EPMC8136488 | biostudies-literature
| S-EPMC2882956 | biostudies-literature
| S-EPMC3497702 | biostudies-literature
| S-EPMC6976291 | biostudies-literature