Project description:16S rRNA gene sequences are commonly analyzed for taxonomic and phylogenetic studies because they contain variable regions that can help distinguish different genera. However, intra-genus distinction using variable region homology is often impossible due to the high overall sequence identities among closely related species, even though some residues may be conserved within respective species. Using a computational method that included the allelic diversity within individual genomes, we discovered that certain Escherichia and Shigella species can be distinguished by a multi-allelic 16S rRNA variable region single nucleotide polymorphism (SNP). To evaluate the performance of 16S rRNAs with altered variable regions, we developed an in vivo system that measures the acceptance and distribution of variant 16S rRNAs into a large pool of natural versions supporting normal translation and growth. We found that 16S rRNAs containing evolutionarily disparate variable regions were underpopulated both in ribosomes and in active translation pools, even for an SNP. Overall, this study revealed that variable region sequences can substantially influence the performance of 16S rRNAs and that this biological constraint can be leveraged to justify refining taxonomic assignments of variable region sequence data. IMPORTANCE This study reevaluates the notion that 16S rRNA gene variable region sequences are uninformative for intra-genus classification and that single nucleotide variations within them have no consequence to strains that bear them. We demonstrated that the performance of 16S rRNAs in Escherichia coli can be negatively impacted by sequence changes in variable regions, even for single nucleotide changes that are native to closely related Escherichia and Shigella species; thus, biological performance is likely constraining the evolution of variable regions in bacteria. Further, the native nucleotide variations we tested occur in all strains of their respective species and across their multiple 16S rRNA gene copies, suggesting that these species evolved beyond what would be discerned from a consensus sequence comparison. Therefore, this work also reveals that the multiple 16S rRNA gene alleles found in most bacteria can provide more informative phylogenetic and taxonomic detail than a single reference allele.
Project description:Gut microbiota comparation of Young mice (n=10), Old mice, Young_yFMT (Young mice 14 days after transplant feces from young mice, n=10) and Young_oFMT (Young mice 14 days after transplant feces from old mice, n=10), Antibiotic group (Cefazolin, n=8).
Project description:The objective of the present study was to identify the nutrient utilization and the SCFA production potential of gut microbes during the first year of life. The 16S sequencing data represents 100 mother-child pairs, longitudinally for the infants (0, 3mo, 6mo and 12mo) and mothers 18 weeks pregnancy. We wanted to identify the SCFA composition in pregnant woman and their infants through the first year of life, and their correlation to gut bacteria and other influencal factors. Metaproteomics on selected infants were analyzed to look for nutrient sources used by potential SCFA producers.
Project description:Faecal microbiota and cytokine profiles of rural Cambodian infants linked to diet and diarrhoeal episodes (16S rRNA amplicon sequencing)
Project description:We aimed to investigate the microbial community composition in patients with intracerebral hemorrhage (ICH) and its effect on prognosis. The relationship between changes in bacterial flora and the prognosis of spontaneous cerebral hemorrhage was studied in two cohort studies. Fecal samples from healthy volunteers and patients with intracerebral hemorrhage were subjected to 16S rRNA sequencing at three time points: T1 (within 24 hours of admission), T2 (3 days post-surgery), and T3 (7 days post-surgery) using Illumina high-throughput sequencing technology.
Project description:BackgroundMetagenomics is a rapidly growing field of research aimed at studying assemblages of uncultured organisms using various sequencing technologies, with the hope of understanding the true diversity of microbes, their functions, cooperation and evolution. There are two main approaches to metagenomics: amplicon sequencing, which involves PCR-targeted sequencing of a specific locus, often 16S rRNA, and random shotgun sequencing. Several tools or packages have been developed for analyzing communities using 16S rRNA sequences. Similarly, a number of tools exist for analyzing randomly sequenced DNA reads.ResultsWe describe an extension of the metagenome analysis tool MEGAN, which allows one to analyze 16S sequences. For the analysis all 16S sequences are blasted against the SILVA database. The result output is imported into MEGAN, using a synonym file that maps the SILVA accession numbers onto the NCBI taxonomy.ConclusionsEnvironmental samples are often studied using both targeted 16S rRNA sequencing and random shotgun sequencing. Hence tools are needed that allow one to analyze both types of data together, and one such tool is MEGAN. The ideas presented in this paper are implemented in MEGAN 4, which is available from: http://www-ab.informatik.uni-tuebingen.de/software/megan.
Project description:BackgroundThe phylogeny of the genus Methanobrevibacter was established almost 25 years ago on the basis of the similarities of the 16S rRNA oligonucleotide catalogs. Since then, many 16S rRNA gene sequences of newly isolated strains or clones representing the genus Methanobrevibacter have been deposited. We tried to reorganize the 16S rRNA gene sequences of this genus and revise the taxonomic affiliation of the isolates and clones representing the genus Methanobrevibacter.ResultsThe phylogenetic analysis of the genus based on 786 bp aligned region from fifty-four representative sequences of the 120 available sequences for the genus revealed seven multi-member groups namely, Ruminantium, Smithii, Woesei, Curvatus, Arboriphilicus, Filiformis, and the Termite gut symbionts along with three separate lineages represented by Mbr. wolinii, Mbr. acididurans, and termite gut flagellate symbiont LHD12. The cophenetic correlation coefficient, a test for the ultrametric properties of the 16S rRNA gene sequences used for the tree was found to be 0.913 indicating the high degree of goodness of fit of the tree topology. A significant relationship was found between the 16S rRNA sequence similarity (S) and the extent of DNA hybridization (D) for the genus with the correlation coefficient (r) for logD and logS, and for [ln(-lnD) and ln(-lnS)] being 0.73 and 0.796 respectively. Our analysis revealed that for this genus, when S = 0.984, D would be <70% at least 99% of the times, and with 70% D as the species "cutoff", any 16S rRNA gene sequence showing <98% sequence similarity can be considered as a separate species. In addition, we deduced group specific signature positions that have remained conserved in evolution of the genus.ConclusionsA very significant relationship between D and S was found to exist for the genus Methanobrevibacter, implying that it is possible to predict D from S with a known precision for the genus. We propose to include the termite gut flagellate symbiont LHD12, the methanogenic endosymbionts of the ciliate Nyctotherus ovalis, and rat feces isolate RT reported earlier, as separate species of the genus Methanobrevibacter.
Project description:Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA) is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%), all tested methods had ?50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ?100% at 100% identity but ?50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal.
Project description:Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.
Project description:Naive Bayes classifiers (NBC) have dominated the field of taxonomic classification of amplicon sequences for over a decade. Apart from having runtime requirements that allow them to be trained and used on modest laptops, they have persistently provided class-topping classification accuracy. In this work we compare NBC with random forest classifiers, neural network classifiers, and a perfect classifier that can only fail when different species have identical sequences, and find that in some practical scenarios there is little scope for improving on NBC for taxonomic classification of 16S rRNA gene sequences. Further improvements in taxonomy classification are unlikely to come from novel algorithms alone, and will need to leverage other technological innovations, such as ecological frequency information.