Project description:Although the majority of previous work on campylobacteriosis has centered on the species Campylobacter jejuni, Campylobacter coli, the sister group to C. jejuni, is also a significant problem, but remains a much less studied organism. The purpose of this study was to develop and apply an expanded 16 locus MLST genotyping scheme to a large collection of C. coli isolates sampled from a wide range of host species, and to complete microarray comparative genomic hybridizations for these same strains, in order to: (1) determine whether host specific clones, genotypes, or clonal complexes are evident and (2) evaluate whether there are particular genes comprising the dispensable portion of the C. coli genome that are more commonly associated with certain host species. Genotyping and ClonalFrame analyses of the expanded MLST data suggest that (1) host preferred groups have tended to evolve in the diversification of C. coli, (2) this has happened repeatedly, at different times, throughout the evolutionary history of the species, and (3) recombination has played varying roles in the diversification of the different groups. Concomitant with the information on evolutionary history derived from the MLST data, the microarray data suggests that a combination of common ancestry in some cases and lateral gene transfer in others are behind a tendency for sets of genes to be common to isolates derived from particular hosts. Keywords: comparative genomic hybridization Combimatrix CustomArray™ 4X2K was used in this study. This array is divided into 4 sectors, each of which contains 2,240 in situ synthesized oligonucleotide probes (spots) with the same probe design and layout. Based on the sequence of Campylobacter coli strain RM2228, oligonucleotide probes were designed to have a similar annealing temperature of 56ºC and a length 35-40 bp. Two separate designs were used in this study; both included 100 control probes (20 negative controls with sequences from plant and phage, each with 5 replicate spots) as well as loci from the RM2228 genome. Because of the strict criteria for probe design, not all ORFs could be covered in this analysis. The first design included 1942 of the 1967 protein coding genes described in the unfinished sequence of C. coli strain RM2228. The second-generation design was based on genes that were not clearly present (loci with low intensity or no hybridization for at least one strain) in the hybridization results involving the first design. The second design included additional two or five probes, separated from one another in order to span the entire gene, for these 615 ambiguous loci, synthesized in situ to occupy the 2,240 independent microarray spots. Replicate microarrays were hybridized for every 65 strains tested in this study.

Project description:Whole genome sequencing (WGS) is increasingly used for epidemiological investigations of pathogens. While SNP variant calling is currently considered as the most suitable method, the choice of a representative reference genome and the isolate dependency of results limit standardization and affect resolution in an unknown manner. Whole or core genome Multi Locus Sequence Typing (wg-, cg-MLST) represents an attractive alternative. Here, we assess the accuracy of wg- and cg-MLST by comparing results of four Pseudomonas aeruginosa datasets for which epidemiological and genomic data were previously described. Three datasets included 155 isolates from three different sequence types (ST) of P. aeruginosa collected in our ICUs over a 5-year period. The fourth dataset consisted of 10 isolates from an investigation of P. aeruginosa contaminated hand soap. All isolates were previously analyzed by a core SNP approach. In this study, wg- and cg-MLST were performed in BioNumericsTM using a scheme developed by Applied-Maths. Correlation between SNP calling and wg- or cg-MLST results were evaluated by calculating linear regressions and their coefficient of correlations (R 2) between the number of SNPs and the number of allele differences in pairwise comparison of isolates. The number of SNPs and allele difference between isolates with close epidemiological linkage varies between 0-26 and 0-13, respectively. When compared to core-SNP calling, a higher coefficient of correlation was obtained with cgMLST (R 2 of 0.92-0.99) than with wgMLST (0.78-0.99). In one dataset, a putative homologous recombination of a large DNA fragment (202 loci) was identified among these isolates, affecting its phylogeny, but with no impact on the epidemiological analysis of outbreak isolates. In conclusion, we showed that the P. aeruginosa wgMLST scheme in BioNumericsTM is as discriminatory as the core-SNP calling approach and apparently useful for outbreak investigations. We also showed that epidemiological linked isolates showed less than 26 SNPs or 13 allele differences. These are important figures for the distinction between outbreak and non-outbreak isolates when interpreting WGS results. However, as P. aeruginosa is highly recombinant, a cgMLST approach is preferable and caution should be addressed to possible recombination of large DNA fragments.

Dataset Information

Core genome MLST versus SNP based WGS analysis of MTBC isolates

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure