Project description:Streptococcus pneumoniae (pneumococcus) is a leading human respiratory pathogen that causes a variety of serious mucosal and invasive diseases. D39 is an historically important serotype 2 strain that was used in experiments by Avery and coworkers to demonstrate that DNA is the genetic material. Although isolated nearly a century ago, D39 remains extremely virulent in murine infection models and is perhaps the strain used most frequently in current studies of pneumococcal pathogenicity. To date, the complete genome sequences have been reported for only two S. pneumoniae strains; TIGR4, a recent serotype 4 clinical isolate, and laboratory strain R6, an avirulent, unencapsulated derivative of strain D39. We report herein the genome sequences of two different isolates of strain D39 and the corrected sequence and updated annotation of strain R6. Comparisons of these three related sequences allowed deduction of the likely sequence of the D39 progenitor and mutations that arose in each isolate. Despite its numerous repeated sequences and IS elements, the serotype 2 genome has remained remarkably stable during cultivation, and one of the D39 isolates contains only 5 relatively minor mutations compared to the deduced D39 progenitor. In contrast, laboratory strain R6 contains 71 single base pair changes, 6 deletions, 4 insertions, and has lost the cryptic pDP1 plasmid compared to the D39 progenitor strain. Many of these mutations are in or affect the expression of genes that play important roles in regulation, metabolism, and virulence. The nature of the mutations that arose spontaneously in these three strains, relative global transcription patterns determined by microarray analyses, and the implications of the D39 genome sequences to studies of pneumococcal physiology and pathogenesis are presented and discussed. Keywords: bacterial strain comparison, bacterial isolate comparison