Project description:Whole-genome sequencing is an important way to understand the genetic information, gene function, biological characteristics, and living mechanisms of organisms. There is no difficulty to have mega-level genomes sequenced at present. However, we encountered a hard-to-sequence genome of Pseudomonas aeruginosa phage PaP1. The shotgun sequencing method failed to dissect this genome. After insisting for 10 years and going over 3 generations of sequencing techniques, we successfully dissected the PaP1 genome with 91,715 bp in length. Single-molecule sequencing revealed that this genome contains lots of modified bases, including 51 N6-methyladenines (m6A) and 152 N4-methylcytosines (m4C). At the same time, further investigations revealed a novel immune mechanism of bacteria, by which the host bacteria can recognize and repel the modified bases containing inserts in large scale, and this led to the failure of the shotgun method in PaP1 genome sequencing. Strategy of resolving this problem is use of non-library dependent sequencing techniques or use of the nfi- mutant of E. coli DH5M-NM-1 as the host bacteria to construct the shotgun library. In conclusion, we unlock the mystery of phage PaP1 genome hard to be sequenced, and discover a new mechanism of bacterial immunity in present study. Methylation profiling of Pseudomonas aeruginosa phage PaP1 using kinetic data generated by single-molecule, real-time (SMRT) sequencing on the PacBio RS.
Project description:Listeria monocytogenes is an opportunistic foodborne pathogen responsible for listeriosis, the third most common foodborne disease. Many different Listeria strains and seroptypes exist, however a proteogenomic resource which would provide a basis for bridging the gap in the molecular understanding between the Listeria genotype and phenotypes via proteotypes is still missing. Here we devised a next-generation proteogenomics strategy which enables the community now to rapidly proteotype Listeria strains and relate the information back to the genotype. Based on sequencing and de novo assembly of the two most commonly used Listeria strain model systems, EGD-e and ScottA, we established a comprehensive Listeria proteogenomic database. A genome comparison established core and strain-specific genes with potential relevance for virulence differences. Next we established a DIA/SWATH-based proteotyping strategy, including a new and robust sample preparation workflow, enabling the reproducible, sensitive and relative quantitative measurement of Listeria proteotypes. This re-usable DIA/SWATH library and new public resource covers 70% of the potentially expressed ORFs of Listeria and represents the most extensive spectral library for Listeria proteotype analysis to date. We used these two new resources to investigate the Listeria proteotype in three states mimicking the upper gastrointestinal passage. Exposure of Listeria to bile salts at 37 °C, mimicking conditions encountered in the duodenum, showed significant proteotype perturbations including an increase of FlaA, the structural protein of flagella. Given that Listeria is known to lose its flagella above 30 °C, this was an unexpected finding. The formation of flagella, which might have implications within the infectivity cycle, was validated by parallel reaction monitoring, light and scanning electron microscopy. QPCR data of flaA transcripts showed no significant differences suggesting a regulation at the post-transcriptional level. Together, we provide a comprehensive proteogenomic resource and toolbox for the Listeria community enabling the analysis of Listeria genotype-proteotype-phenotype relationships.
Project description:This dataset contains spectral information of protein N-terminal peptides isolated from Listeria monocytogenes EGD-e, a bacterial model organism and human pathogen. When mapped onto the Listeria genome these peptides indicate the exact location of translation initiation sites (TIS). The large majority of the identified TIS corresponded to start sites of predicted open reading frames (ORFs), however, a significant fraction of the identified TIS indicated deviations from the current genome annotation. The latter include primarily TIS inside the sequence of predicted ORFs or TIS that delineate the start position of novel ORFs.