Dataset Information

An efficient strategy using k-mers to analyse 16S rRNA sequences.

ABSTRACT: The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples.

SUBMITTER: Martinez-Porchas M

PROVIDER: S-EPMC5537200 | biostudies-other | 2017 Jul

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:16S rRNA gene sequences are commonly analyzed for taxonomic and phylogenetic studies because they contain variable regions that can help distinguish different genera. However, intra-genus distinction using variable region homology is often impossible due to the high overall sequence identities among closely related species, even though some residues may be conserved within respective species. Using a computational method that included the allelic diversity within individual genomes, we discovered that certain Escherichia and Shigella species can be distinguished by a multi-allelic 16S rRNA variable region single nucleotide polymorphism (SNP). To evaluate the performance of 16S rRNAs with altered variable regions, we developed an in vivo system that measures the acceptance and distribution of variant 16S rRNAs into a large pool of natural versions supporting normal translation and growth. We found that 16S rRNAs containing evolutionarily disparate variable regions were underpopulated both in ribosomes and in active translation pools, even for an SNP. Overall, this study revealed that variable region sequences can substantially influence the performance of 16S rRNAs and that this biological constraint can be leveraged to justify refining taxonomic assignments of variable region sequence data. IMPORTANCE This study reevaluates the notion that 16S rRNA gene variable region sequences are uninformative for intra-genus classification and that single nucleotide variations within them have no consequence to strains that bear them. We demonstrated that the performance of 16S rRNAs in Escherichia coli can be negatively impacted by sequence changes in variable regions, even for single nucleotide changes that are native to closely related Escherichia and Shigella species; thus, biological performance is likely constraining the evolution of variable regions in bacteria. Further, the native nucleotide variations we tested occur in all strains of their respective species and across their multiple 16S rRNA gene copies, suggesting that these species evolved beyond what would be discerned from a consensus sequence comparison. Therefore, this work also reveals that the multiple 16S rRNA gene alleles found in most bacteria can provide more informative phylogenetic and taxonomic detail than a single reference allele.

Project description:BackgroundThe phylogeny of the genus Methanobrevibacter was established almost 25 years ago on the basis of the similarities of the 16S rRNA oligonucleotide catalogs. Since then, many 16S rRNA gene sequences of newly isolated strains or clones representing the genus Methanobrevibacter have been deposited. We tried to reorganize the 16S rRNA gene sequences of this genus and revise the taxonomic affiliation of the isolates and clones representing the genus Methanobrevibacter.ResultsThe phylogenetic analysis of the genus based on 786 bp aligned region from fifty-four representative sequences of the 120 available sequences for the genus revealed seven multi-member groups namely, Ruminantium, Smithii, Woesei, Curvatus, Arboriphilicus, Filiformis, and the Termite gut symbionts along with three separate lineages represented by Mbr. wolinii, Mbr. acididurans, and termite gut flagellate symbiont LHD12. The cophenetic correlation coefficient, a test for the ultrametric properties of the 16S rRNA gene sequences used for the tree was found to be 0.913 indicating the high degree of goodness of fit of the tree topology. A significant relationship was found between the 16S rRNA sequence similarity (S) and the extent of DNA hybridization (D) for the genus with the correlation coefficient (r) for logD and logS, and for [ln(-lnD) and ln(-lnS)] being 0.73 and 0.796 respectively. Our analysis revealed that for this genus, when S = 0.984, D would be <70% at least 99% of the times, and with 70% D as the species "cutoff", any 16S rRNA gene sequence showing <98% sequence similarity can be considered as a separate species. In addition, we deduced group specific signature positions that have remained conserved in evolution of the genus.ConclusionsA very significant relationship between D and S was found to exist for the genus Methanobrevibacter, implying that it is possible to predict D from S with a known precision for the genus. We propose to include the termite gut flagellate symbiont LHD12, the methanogenic endosymbionts of the ciliate Nyctotherus ovalis, and rat feces isolate RT reported earlier, as separate species of the genus Methanobrevibacter.

Project description:BackgroundRecent evidences have suggested that human microorganisms participate in important biological activities in the human body. The dysfunction of host-microbiota interactions could lead to complex human disorders. The knowledge on host-microbiota interactions can provide valuable insights into understanding the pathological mechanism of diseases. However, it is time-consuming and costly to identify the disorder-specific microbes from the biological "haystack" merely by routine wet-lab experiments. With the developments in next-generation sequencing and omics-based trials, it is imperative to develop computational prediction models for predicting microbe-disease associations on a large scale.ResultsBased on the known microbe-disease associations derived from the Human Microbe-Disease Association Database (HMDAD), the proposed model shows reliable performance with high values of the area under ROC curve (AUC) of 0.9456 and 0.8866 in leave-one-out cross validations and five-fold cross validations, respectively. In case studies of colorectal carcinoma, 80% out of the top-20 predicted microbes have been experimentally confirmed via published literatures.ConclusionBased on the assumption that functionally similar microbes tend to share the similar interaction patterns with human diseases, we here propose a group based computational model of Bayesian disease-oriented ranking to prioritize the most potential microbes associating with various human diseases. Based on the sequence information of genes, two computational approaches (BLAST+ and MEGA 7) are leveraged to measure the microbe-microbe similarity from different perspectives. The disease-disease similarity is calculated by capturing the hierarchy information from the Medical Subject Headings (MeSH) data. The experimental results illustrate the accuracy and effectiveness of the proposed model. This work is expected to facilitate the characterization and identification of promising microbial biomarkers.

Dataset Information

An efficient strategy using k-mers to analyse 16S rRNA sequences.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets