Dataset Information

Impacts and shortcomings of genetic clustering methods for infectious disease outbreaks.

ABSTRACT: For infectious diseases, a genetic cluster is a group of closely related infections that is usually interpreted as representing a recent outbreak of transmission. Genetic clustering methods are becoming increasingly popular for molecular epidemiology, especially in the context of HIV where there is now considerable interest in applying these methods to prioritize groups for public health resources such as pre-exposure prophylaxis. To date, genetic clustering has generally been performed with ad hoc algorithms, only some of which have since been encoded and distributed as free software. These algorithms have seldom been validated on simulated data where clusters are known, and their interpretation and similarities are not transparent to users outside of the field. Here, I provide a brief overview on the development and inter-relationships of genetic clustering methods, and an evaluation of six methods on data simulated under an epidemic model in a risk-structured population. The simulation analysis demonstrates that the majority of clustering methods are systematically biased to detect variation in sampling rates among subpopulations, not variation in transmission rates. I discuss these results in the context of previous work and the implications for public health applications of genetic clustering.

SUBMITTER: Poon AF

PROVIDER: S-EPMC5210024 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP), which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85%) and specificity (91%) than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46%) as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where it is critical to robustly and accurately identify clusters for the most cost-effective deployment of outbreak management and prevention resources.

Project description:BackgroundAs demonstrated during the global Ebola crisis of 2014-2016, healthcare institutions in high resource settings need support concerning preparedness during threats of infectious disease outbreaks. This study aimed to exploratively develop a standardized preparedness system to use during unfolding threats of severe infectious diseases.MethodsA qualitative three-step study among infectious disease prevention and control experts was performed. First, interviews (n = 5) were conducted to identify which factors trigger preparedness activities during an unfolding threat. Second, these triggers informed the design of a phased preparedness system which was tested in a focus group discussion (n = 5) were conducted to identify which factors trigger preparedness activities during an unfolding threat. Second, these triggers informed the design of a phased preparedness system which was tested in a focus group discussion (n = 5) were conducted to identify which factors trigger preparedness activities during an unfolding threat. Second, these triggers informed the design of a phased preparedness system which was tested in a focus group discussion (.ResultsFour preparedness phases were identified: preparedness phase green is a situation without the presence of the infectious disease threat that requires centralized care, anywhere in the world. Phase yellow is an outbreak in the world with some likelihood of imported cases. Phase orange is a realistic chance of an unexpected case within the country, or unrest developing among population or staff; phase red is cases admitted to hospitals in the country, potentially causing a shortage of resources. Specific preparedness activities included infection prevention, diagnostics, patient care, staff, and communication. Consensus was reached on the need for the development of a preparedness system and national coordination during threats.ConclusionsIn this study, we developed a standardized system to support institutional preparedness during an increasing threat. Use of this system by both curative healthcare institutions and the (municipal) public health service, could help to effectively communicate and align preparedness activities during future threats of severe infectious diseases.

Dataset Information

Impacts and shortcomings of genetic clustering methods for infectious disease outbreaks.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets