Project description:The objective of the present study was to identify the nutrient utilization and the SCFA production potential of gut microbes during the first year of life. The 16S sequencing data represents 100 mother-child pairs, longitudinally for the infants (0, 3mo, 6mo and 12mo) and mothers 18 weeks pregnancy. We wanted to identify the SCFA composition in pregnant woman and their infants through the first year of life, and their correlation to gut bacteria and other influencal factors. Metaproteomics on selected infants were analyzed to look for nutrient sources used by potential SCFA producers.
Project description:BACKGROUND: Metagenomics is a rapidly growing field of research aimed at studying assemblages of uncultured organisms using various sequencing technologies, with the hope of understanding the true diversity of microbes, their functions, cooperation and evolution. There are two main approaches to metagenomics: amplicon sequencing, which involves PCR-targeted sequencing of a specific locus, often 16S rRNA, and random shotgun sequencing. Several tools or packages have been developed for analyzing communities using 16S rRNA sequences. Similarly, a number of tools exist for analyzing randomly sequenced DNA reads. RESULTS: We describe an extension of the metagenome analysis tool MEGAN, which allows one to analyze 16S sequences. For the analysis all 16S sequences are blasted against the SILVA database. The result output is imported into MEGAN, using a synonym file that maps the SILVA accession numbers onto the NCBI taxonomy. CONCLUSIONS: Environmental samples are often studied using both targeted 16S rRNA sequencing and random shotgun sequencing. Hence tools are needed that allow one to analyze both types of data together, and one such tool is MEGAN. The ideas presented in this paper are implemented in MEGAN 4, which is available from: http://www-ab.informatik.uni-tuebingen.de/software/megan.
Project description:BackgroundThe phylogeny of the genus Methanobrevibacter was established almost 25 years ago on the basis of the similarities of the 16S rRNA oligonucleotide catalogs. Since then, many 16S rRNA gene sequences of newly isolated strains or clones representing the genus Methanobrevibacter have been deposited. We tried to reorganize the 16S rRNA gene sequences of this genus and revise the taxonomic affiliation of the isolates and clones representing the genus Methanobrevibacter.ResultsThe phylogenetic analysis of the genus based on 786 bp aligned region from fifty-four representative sequences of the 120 available sequences for the genus revealed seven multi-member groups namely, Ruminantium, Smithii, Woesei, Curvatus, Arboriphilicus, Filiformis, and the Termite gut symbionts along with three separate lineages represented by Mbr. wolinii, Mbr. acididurans, and termite gut flagellate symbiont LHD12. The cophenetic correlation coefficient, a test for the ultrametric properties of the 16S rRNA gene sequences used for the tree was found to be 0.913 indicating the high degree of goodness of fit of the tree topology. A significant relationship was found between the 16S rRNA sequence similarity (S) and the extent of DNA hybridization (D) for the genus with the correlation coefficient (r) for logD and logS, and for [ln(-lnD) and ln(-lnS)] being 0.73 and 0.796 respectively. Our analysis revealed that for this genus, when S = 0.984, D would be <70% at least 99% of the times, and with 70% D as the species "cutoff", any 16S rRNA gene sequence showing <98% sequence similarity can be considered as a separate species. In addition, we deduced group specific signature positions that have remained conserved in evolution of the genus.ConclusionsA very significant relationship between D and S was found to exist for the genus Methanobrevibacter, implying that it is possible to predict D from S with a known precision for the genus. We propose to include the termite gut flagellate symbiont LHD12, the methanogenic endosymbionts of the ciliate Nyctotherus ovalis, and rat feces isolate RT reported earlier, as separate species of the genus Methanobrevibacter.
Project description:Hypervariable regions V3-V5 of bacterial 16S rRNA genes. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Project description:Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA) is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%), all tested methods had ?50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ?100% at 100% identity but ?50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal.
Project description:Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.
Project description:Naive Bayes classifiers (NBC) have dominated the field of taxonomic classification of amplicon sequences for over a decade. Apart from having runtime requirements that allow them to be trained and used on modest laptops, they have persistently provided class-topping classification accuracy. In this work we compare NBC with random forest classifiers, neural network classifiers, and a perfect classifier that can only fail when different species have identical sequences, and find that in some practical scenarios there is little scope for improving on NBC for taxonomic classification of 16S rRNA gene sequences. Further improvements in taxonomy classification are unlikely to come from novel algorithms alone, and will need to leverage other technological innovations, such as ecological frequency information.
Project description:The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples.