Dataset Information

Large scale automated phylogenomic analysis of bacterial isolates and the Evergreen Online platform.

ABSTRACT: Public health authorities whole-genome sequence thousands of isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and the need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. The data is divided into sets by mapping to reference genomes, then consensus sequences are generated. Nucleotide based genetic distance is calculated between the sequences in each set, and isolates are clustered together at 10 single-nucleotide polymorphisms. Phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are added back. The method is accurate at grouping outbreak strains together, while discriminating them from non-outbreak strains. The pipeline is applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating phylogenetic trees as needed.

SUBMITTER: Szarvas J

PROVIDER: S-EPMC7083913 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Large scale automated phylogenomic analysis of bacterial isolates and the Evergreen Online platform.

Szarvas Judit J Ahrenfeldt Johanne J Cisneros Jose Luis Bellod JLB Thomsen Martin Christen Frølund MCF Aarestrup Frank M FM Lund Ole O

Communications biology 20200320 1

Public health authorities whole-genome sequence thousands of isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and the need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. The data is divided into sets by mapping to reference genomes, then consensus seq ...[more]

PMID: 32198478

Similar Datasets

Project description:Participation in research can be beneficial for patients and healthcare providers, but may prove demanding at patient, clinician and organizational levels. Patient representatives are supportive of online research to overcome these challenges. The aim of this pilot study was to develop an online recruitment platform and test its feasibility and acceptability while evaluating the accuracy of participant-reported data. The online research platform was developed in a 1-day 'hackathon' with a digital design company. Women who underwent implant-based breast reconstruction in 2011-2016 were invited by letter containing the web address (URL) of the study site and their unique study number. Once online, participants learned about the study, consented, entered data on demographics, treatment received and patient-reported outcome measures (BREAST-Q™), and booked an appointment for a single hospital visit for three-dimensional surface imaging (3D-SI). Real-time process evaluation was performed. The primary endpoint was recruitment rate. The recruitment rate was 40 per cent. Of the 100 women, 50 logged on to the platform and 40 completed the process through to 3D-SI. The majority of discontinuations after logging on occurred between consenting and entering demographics (3 women, 6 per cent), and between completing the BREAST-Q and booking an appointment for 3D-SI using the online calendar (3 women, 6 per cent). All women completed the online BREAST-Q™ once started. Participants took a median of 23 minutes to complete the online process. Patient-reported clinical data were accurate in 12 of 13 domains compared with electronic records (95 per cent concordance). Process evaluation demonstrated acceptability. The results of this pilot demonstrate the online platform to be acceptable, feasible, and accurate for this population from a single institution. The low-burden design may enable participation from centres with less research support and participants from hard-to-reach groups or dispersed geographical locations, but with online access.

Project description:BackgroundThere are presently hundreds of online databases hosting millions of chemical compounds and associated data. As a result of the number of cheminformatics software tools that can be used to produce the data, subtle differences between the various cheminformatics platforms, as well as the naivety of the software users, there are a myriad of issues that can exist with chemical structure representations online. In order to help facilitate validation and standardization of chemical structure datasets from various sources we have delivered a freely available internet-based platform to the community for the processing of chemical compound datasets.ResultsThe chemical validation and standardization platform (CVSP) both validates and standardizes chemical structure representations according to sets of systematic rules. The chemical validation algorithms detect issues with submitted molecular representations using pre-defined or user-defined dictionary-based molecular patterns that are chemically suspicious or potentially requiring manual review. Each identified issue is assigned one of three levels of severity - Information, Warning, and Error - in order to conveniently inform the user of the need to browse and review subsets of their data. The validation process includes validation of atoms and bonds (e.g., making aware of query atoms and bonds), valences, and stereo. The standard form of submission of collections of data, the SDF file, allows the user to map the data fields to predefined CVSP fields for the purpose of cross-validating associated SMILES and InChIs with the connection tables contained within the SDF file. This platform has been applied to the analysis of a large number of data sets prepared for deposition to our ChemSpider database and in preparation of data for the Open PHACTS project. In this work we review the results of the automated validation of the DrugBank dataset, a popular drug and drug target database utilized by the community, and ChEMBL 17 data set. CVSP web site is located at http://cvsp.chemspider.com/.ConclusionA platform for the validation and standardization of chemical structure representations of various formats has been developed and made available to the community to assist and encourage the processing of chemical structure files to produce more homogeneous compound representations for exchange and interchange between online databases. While the CVSP platform is designed with flexibility inherent to the rules that can be used for processing the data we have produced a recommended rule set based on our own experiences with the large data sets such as DrugBank, ChEMBL, and data sets from ChemSpider.

Dataset Information

Large scale automated phylogenomic analysis of bacterial isolates and the Evergreen Online platform.

Publications

Large scale automated phylogenomic analysis of bacterial isolates and the Evergreen Online platform.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets