Project description:Livestock especially cattle are known as a main reservoir of Escherichia coli O157:H7. This bacterium is considered as a pathogenic agent characterized by producing toxins, which are familiarly known as Shiga-like toxin-1 (Stx1) and Stx2. The aim of this work was to analyse the novel sequence of the 16S rRNA gene of strains isolated in this study in order to know the phylogenetic relationships between these sequences and those between the sequences of bacteria available in databanks. The results of this analysis showed that the strains KL-48(2) and SM25(1) that originated from human and cattle feces, respectively, are closely related among them and with respect to E. coli EDL 933, E. coli Sakai, E. coli ATCC 43894, E. coli O111:H-, E. coli O121:H19, E. coli O104:H4, and Shigella sonnei with more than 99% similarity values.
Project description:Enteropathogenic Escherichia coli (EPEC) pathotype represents a minor proportion of E. coli O103 strains shed in the feces of feedlot cattle. The draft genome sequences of 13 strains of EPEC O103 are reported here. The availability of the genome sequences will help in the assessment of genetic diversity and virulence potential of bovine EPEC O103.
Project description:The enterohemorrhagic pathotype represents a minor proportion of the Escherichia coli O103 strains shed in the feces of cattle. We report here the genome sequences of 43 strains of enterohemorrhagic E. coli (EHEC) O103:H2 isolated from feedlot cattle feces. The genomic analysis will provide information on the genetic diversity and virulence potential of bovine EHEC O103.
Project description:16S rRNA gene sequences are commonly analyzed for taxonomic and phylogenetic studies because they contain variable regions that can help distinguish different genera. However, intra-genus distinction using variable region homology is often impossible due to the high overall sequence identities among closely related species, even though some residues may be conserved within respective species. Using a computational method that included the allelic diversity within individual genomes, we discovered that certain Escherichia and Shigella species can be distinguished by a multi-allelic 16S rRNA variable region single nucleotide polymorphism (SNP). To evaluate the performance of 16S rRNAs with altered variable regions, we developed an in vivo system that measures the acceptance and distribution of variant 16S rRNAs into a large pool of natural versions supporting normal translation and growth. We found that 16S rRNAs containing evolutionarily disparate variable regions were underpopulated both in ribosomes and in active translation pools, even for an SNP. Overall, this study revealed that variable region sequences can substantially influence the performance of 16S rRNAs and that this biological constraint can be leveraged to justify refining taxonomic assignments of variable region sequence data. IMPORTANCE This study reevaluates the notion that 16S rRNA gene variable region sequences are uninformative for intra-genus classification and that single nucleotide variations within them have no consequence to strains that bear them. We demonstrated that the performance of 16S rRNAs in Escherichia coli can be negatively impacted by sequence changes in variable regions, even for single nucleotide changes that are native to closely related Escherichia and Shigella species; thus, biological performance is likely constraining the evolution of variable regions in bacteria. Further, the native nucleotide variations we tested occur in all strains of their respective species and across their multiple 16S rRNA gene copies, suggesting that these species evolved beyond what would be discerned from a consensus sequence comparison. Therefore, this work also reveals that the multiple 16S rRNA gene alleles found in most bacteria can provide more informative phylogenetic and taxonomic detail than a single reference allele.
Project description:Faecal microbiota and cytokine profiles of rural Cambodian infants linked to diet and diarrhoeal episodes (16S rRNA amplicon sequencing)
Project description:The objective of the present study was to identify the nutrient utilization and the SCFA production potential of gut microbes during the first year of life. The 16S sequencing data represents 100 mother-child pairs, longitudinally for the infants (0, 3mo, 6mo and 12mo) and mothers 18 weeks pregnancy. We wanted to identify the SCFA composition in pregnant woman and their infants through the first year of life, and their correlation to gut bacteria and other influencal factors. Metaproteomics on selected infants were analyzed to look for nutrient sources used by potential SCFA producers.
Project description:BackgroundMetagenomics is a rapidly growing field of research aimed at studying assemblages of uncultured organisms using various sequencing technologies, with the hope of understanding the true diversity of microbes, their functions, cooperation and evolution. There are two main approaches to metagenomics: amplicon sequencing, which involves PCR-targeted sequencing of a specific locus, often 16S rRNA, and random shotgun sequencing. Several tools or packages have been developed for analyzing communities using 16S rRNA sequences. Similarly, a number of tools exist for analyzing randomly sequenced DNA reads.ResultsWe describe an extension of the metagenome analysis tool MEGAN, which allows one to analyze 16S sequences. For the analysis all 16S sequences are blasted against the SILVA database. The result output is imported into MEGAN, using a synonym file that maps the SILVA accession numbers onto the NCBI taxonomy.ConclusionsEnvironmental samples are often studied using both targeted 16S rRNA sequencing and random shotgun sequencing. Hence tools are needed that allow one to analyze both types of data together, and one such tool is MEGAN. The ideas presented in this paper are implemented in MEGAN 4, which is available from: http://www-ab.informatik.uni-tuebingen.de/software/megan.
Project description:Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding ("embedding") each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.