Unknown

Dataset Information

0

Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF.


ABSTRACT: Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k.

SUBMITTER: Cong Y 

PROVIDER: S-EPMC5243798 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF.

Cong Yingnan Y   Chan Yao-Ban YB   Phillips Charles A CA   Langston Michael A MA   Ragan Mark A MA  

Frontiers in microbiology 20170119


Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document fre  ...[more]

Similar Datasets

| S-EPMC4958990 | biostudies-literature
| S-EPMC7081997 | biostudies-literature
| S-EPMC4958984 | biostudies-literature
| S-EPMC4350174 | biostudies-other
| S-EPMC4034247 | biostudies-literature
| S-EPMC6612863 | biostudies-other
| S-EPMC4423992 | biostudies-literature
2014-12-21 | GSE64376 | GEO
2014-12-21 | E-GEOD-64376 | biostudies-arrayexpress
| S-EPMC5372337 | biostudies-literature