Unknown

Dataset Information

0

Whole-Genome k-mer Topic Modeling AssociatesBacterial Families.


ABSTRACT: Alignment-free k-mer-based algorithms in whole genome sequence comparisons remainan ongoing challenge. Here, we explore the possibility to use Topic Modeling for organismwhole-genome comparisons. We analyzed 30 complete genomes from three bacterial families bytopic modeling. For this, each genome was considered as a document and 13-mer nucleotiderepresentations as words. Latent Dirichlet allocation was used as the probabilistic modeling of thecorpus. We where able to identify the topic distribution among analyzed genomes, which is highlyconsistent with traditional hierarchical classification. It is possible that topic modeling may be appliedto establish relationships between genome's composition and biological phenomena.

SUBMITTER: Borrayo-Carbajal E 

PROVIDER: S-EPMC7074292 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Whole-Genome <i>k</i>-mer Topic Modeling AssociatesBacterial Families.

Borrayo-Carbajal Ernesto E   May-Canche Isaias I   Paredes Omar O   Morales J Alejandro JA   Romo-Vázquez Rebeca R   Vélez-Pérez Hugo H  

Genes 20200214 2


Alignment-free k-mer-based algorithms in whole genome sequence comparisons remainan ongoing challenge. Here, we explore the possibility to use Topic Modeling for organismwhole-genome comparisons. We analyzed 30 complete genomes from three bacterial families bytopic modeling. For this, each genome was considered as a document and 13-mer nucleotiderepresentations as words. Latent Dirichlet allocation was used as the probabilistic modeling of thecorpus. We where able to identify the topic distribut  ...[more]

Similar Datasets

2005-02-10 | GSE2247 | GEO
2005-02-09 | E-GEOD-2247 | biostudies-arrayexpress
| S-EPMC5681697 | biostudies-literature
| S-EPMC4801402 | biostudies-literature
| S-EPMC3098182 | biostudies-literature
2012-09-29 | GSE41211 | GEO
2012-09-29 | E-GEOD-41211 | biostudies-arrayexpress
| S-EPMC8568955 | biostudies-literature
| S-EPMC5137707 | biostudies-literature
| S-EPMC6892538 | biostudies-literature