Project description:Background: Based on 32 Escherichia coli and Shigella genome sequences, we have developed an E. coli pan-genome microarray. Publicly available genomes were annotated in a consistent manor to define all currently known genes potentially present in the species. The chip design was evaluated by hybridization of DNA from two sequenced E. coli strains, K-12 MG1655 (a commensal) and O157:H7 EDL933 (an enterotoxigenic E. coli). A dual channel and single channel analysis approach was compared for the comparative genomic hybridization experiments. Moreover, the microarray was used to characterize four unsequenced probiotic E. coli strains, currently marketed for beneficial effects on the human gut flora. Results: Based on the genomes included in this study, we were able to group together 2,041 genes that were present in all 32 genomes. Furthermore, we predict that the size of the E. coli core genome will approach ~1,560 essential genes, considerably less than previous estimates. Although any individual E. coli genome contains between 4,000 and 5,000 genes, we identified more than twice as many (11,872) distinct gene groups in the total gene pool (“pan-genome”) examined for microarray design. Benchmarking of the design based on sequenced control strain samples demonstrated a high sensitivity and relatively low false positive rate. Moreover, the array was highly sufficient to investigate the gene content of apathogenic isolates, despite the strong bias towards pathogenic E. coli strains that have been sequenced so far. Our analysis of four probiotic E. coli strains demonstrate that they share a gene pool very similar to the E. coli K-12 strains but also show significant similarity with enteropathogenic strains. Nonetheless, virulence genes were largely absent. Strain-specific genes found in probiotic E. coli but absent in E. coli K12 were most frequently phage-related genes, transposases and other genes related to mobile DNA, and metabolic enzymes or factors that may offer colonization fitness, which together with their asymptomatic nature may explain their nature. Conclusion: This high-density microarray provides an excellent tool for characterizing either DNA content or gene expression from unknown E. coli strains. Keywords: Comparative genomic hybridizations
Project description:Background: Based on 32 Escherichia coli and Shigella genome sequences, we have developed an E. coli pan-genome microarray. Publicly available genomes were annotated in a consistent manor to define all currently known genes potentially present in the species. The chip design was evaluated by hybridization of DNA from two sequenced E. coli strains, K-12 MG1655 (a commensal) and O157:H7 EDL933 (an enterotoxigenic E. coli). A dual channel and single channel analysis approach was compared for the comparative genomic hybridization experiments. Moreover, the microarray was used to characterize four unsequenced probiotic E. coli strains, currently marketed for beneficial effects on the human gut flora. Results: Based on the genomes included in this study, we were able to group together 2,041 genes that were present in all 32 genomes. Furthermore, we predict that the size of the E. coli core genome will approach ~1,560 essential genes, considerably less than previous estimates. Although any individual E. coli genome contains between 4,000 and 5,000 genes, we identified more than twice as many (11,872) distinct gene groups in the total gene pool (âpan-genomeâ) examined for microarray design. Benchmarking of the design based on sequenced control strain samples demonstrated a high sensitivity and relatively low false positive rate. Moreover, the array was highly sufficient to investigate the gene content of apathogenic isolates, despite the strong bias towards pathogenic E. coli strains that have been sequenced so far. Our analysis of four probiotic E. coli strains demonstrate that they share a gene pool very similar to the E. coli K-12 strains but also show significant similarity with enteropathogenic strains. Nonetheless, virulence genes were largely absent. Strain-specific genes found in probiotic E. coli but absent in E. coli K12 were most frequently phage-related genes, transposases and other genes related to mobile DNA, and metabolic enzymes or factors that may offer colonization fitness, which together with their asymptomatic nature may explain their nature. Conclusion: This high-density microarray provides an excellent tool for characterizing either DNA content or gene expression from unknown E. coli strains. Factorial design: Each of four test samples (G 1/2, G3/10, G 4/9, G5) are co-hybridized with two control strain samples (K-12 MG1655 and O157:H7 EDL933). Additional replicate co-hybridizations are included of the two control strain samples (O157:H7 EDL933 vs. K-12 MG1655).
Project description:MotivationPangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way.ResultsWe wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs.Availability and implementationODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BackgroundA revolutionary diversion from classical vaccinology to reverse vaccinology approach has been observed in the last decade. The ever-increasing genomic and proteomic data has greatly facilitated the vaccine designing and development process. Reverse vaccinology is considered as a cost-effective and proficient approach to screen the entire pathogen genome. To look for broad-spectrum immunogenic targets and analysis of closely-related bacterial species, the assimilation of pangenome concept into reverse vaccinology approach is essential. The categories of species pangenome such as core, accessory, and unique genes sets can be analyzed for the identification of vaccine candidates through reverse vaccinology.ResultsWe have designed an integrative computational pipeline term as "PanRV" that employs both the pangenome and reverse vaccinology approaches. PanRV comprises of four functional modules including i) Pangenome Estimation Module (PGM) ii) Reverse Vaccinology Module (RVM) iii) Functional Annotation Module (FAM) and iv) Antibiotic Resistance Association Module (ARM). The pipeline is tested by using genomic data from 301 genomes of Staphylococcus aureus and the results are verified by experimentally known antigenic data.ConclusionThe proposed pipeline has proved to be the first comprehensive automated pipeline that can precisely identify putative vaccine candidates exploiting the microbial pangenome. PanRV is a Linux based package developed in JAVA language. An executable installer is provided for ease of installation along with a user manual at https://sourceforge.net/projects/panrv2/ .
Project description:We determined nucleosome positions genome-wide in diploid Saccharomyces species undergoing early stages of synchronous meiosis. This study sought to assess if meiotic DNA double-strand break formation occurred preferentially in promoter nucleosome-depleted regions in other Saccharomyces species, as it does in S. cerevisiae SK1 (Pan et al. 2011 Cell 144:719-731).