Dataset Information

In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm.

ABSTRACT:

Background and objectives

To investigate the reproducibility and validity of latent class analysis (LCA) and hierarchical cluster analysis (HCA), multiple correspondence analysis followed by k-means (MCA-kmeans) and k-means (kmeans) for multimorbidity clustering.

Methods

We first investigated clustering algorithms in simulated datasets with 26 diseases of varying prevalence in predetermined clusters, comparing the derived clusters to known clusters using the adjusted Rand Index (aRI). We then them investigated in the medical records of male patients, aged 65 to 84 years from 50 UK general practices, with 49 long-term health conditions. We compared within cluster morbidity profiles using the Pearson correlation coefficient and assessed cluster stability was in 400 bootstrap samples.

Results

In the simulated datasets, the closest agreement (largest aRI) to known clusters was with LCA and then MCA-kmeans algorithms. In the medical records dataset, all four algorithms identified one cluster of 20-25% of the dataset with about 82% of the same patients across all four algorithms. LCA and MCA-kmeans both found a second cluster of 7% of the dataset. Other clusters were found by only one algorithm. LCA and MCA-kmeans clustering gave the most similar partitioning (aRI 0.54).

Conclusion

LCA achieved higher aRI than other clustering algorithms.

SUBMITTER: Nichols L

PROVIDER: S-EPMC7613854 | biostudies-literature | 2022 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm.

Nichols Linda L Taverner Tom T Crowe Francesca F Richardson Sylvia S Yau Christopher C Kiddle Steven S Kirk Paul P Barrett Jessica J Nirantharakumar Krishnarajah K Griffin Simon S Edwards Duncan D Marshall Tom T

Journal of clinical epidemiology 20221011

<h4>Background and objectives</h4>To investigate the reproducibility and validity of latent class analysis (LCA) and hierarchical cluster analysis (HCA), multiple correspondence analysis followed by k-means (MCA-kmeans) and k-means (kmeans) for multimorbidity clustering.<h4>Methods</h4>We first investigated clustering algorithms in simulated datasets with 26 diseases of varying prevalence in predetermined clusters, comparing the derived clusters to known clusters using the adjusted Rand Index (a ...[more]

PMID: 36228971

Similar Datasets

Project description:BackgroundHealth-risk behaviours such as smoking, unhealthy nutrition, alcohol consumption, and physical inactivity (termed SNAP behaviours) are leading risk factors for multimorbidity and tend to cluster (i.e. occur in specific combinations within distinct subpopulations). However, little is known about how these clusters change with age in older adults, and whether and how cluster membership is associated with multimorbidity.MethodsRepeated measures latent class analysis using data from Waves 4-8 of the English Longitudinal Study of Ageing (ELSA; n = 4759) identified clusters of respondents with common patterns of SNAP behaviours over time. Disease status (from Wave 9) was used to assess disorders of eight body systems, multimorbidity, and complex multimorbidity. Multinomial and binomial logistic regressions were used to examine how clusters were associated with socio-demographic characteristics and disease status.FindingsSeven clusters were identified: Low-risk (13.4%), Low-risk yet inactive (16.8%), Low-risk yet heavy drinkers (11.4%), Abstainer yet inactive (20.0%), Poor diet and inactive (12.9%), Inactive, heavy drinkers (14.5%), and High-risk smokers (10.9%). There was little evidence that these clusters changed with age. People in the clusters characterised by physical inactivity (in combination with other risky behaviours) had lower levels of education and wealth. People in the heavy drinking clusters were predominantly male. Compared to other clusters, people in the Low-risk and Low-risk yet heavy drinkers had a lower prevalence of all health conditions studied. In contrast, the Abstainer but inactive cluster comprised mostly women and had the highest prevalence of multimorbidity, complex multimorbidity, and endocrine disorders. High-risk smokers were most likely to have respiratory disorders.ConclusionsHealth-risk behaviours tend to be stable as people age and so ought to be addressed early. We identified seven clusters of older adults with distinct patterns of behaviour, socio-demographic characteristics and multimorbidity prevalence. Intervention developers could use this information to identify high-risk subpopulations and tailor interventions to their behaviour patterns and socio-demographic profiles.

Project description:IntroductionSouth Africa has the largest burden of HIV worldwide and has a growing burden of non-communicable diseases; the combination of which may lead to diseases clustering in ways that are not seen in other regions. This study sought to identify common disease classes and sociodemographic and lifestyle factors associated with each disease class.MethodsData were analyzed from the South African Demographic and Health Survey 2016. A latent class analysis (LCA) was conducted using nine disease conditions. Sociodemographic and behavioral factors associated with each disease cluster were explored. All analysis was conducted in Stata 15 and the LCA Stata plugin was used to conduct the latent class and regression analysis.ResultsMultimorbid participants were included (n = 2 368). Four disease classes were identified: (1) HIV, Hypertension and Anemia (comprising 39.4% of the multimorbid population), (2) Anemia and Hypertension (23.7%), (3) Cardiovascular-related (19.9%) and (4) Diabetes and Hypertension (17.0%). Age, sex, and lifestyle risk factors were associated with class membership. In terms of age, with older adults were less likely to belong to the first class (HIV, Hypertension and Anemia). Males were more likely to belong to Class 2 (Anemia and Hypertension) and Class 4 (Diabetes and Hypertension). In terms of alcohol consumption, those that consumed alcohol were less likely to belong to Class 4 (Diabetes and Hypertension). Current smokers were more likely to belong to Class 3 (Cardiovascular-related). People with a higher body mass index tended to belong to Class 3 (Cardiovascular-related) or the Class 4 (Diabetes and Hypertension).ConclusionThis study affirmed that integrated care is urgently needed, evidenced by the largest disease class being an overlap of chronic infectious diseases and non-communicable diseases. This study also highlighted the need for hypertension to be addressed. Tackling the risk factors associated with hypertension could avert an epidemic of multimorbidity.

Project description:BackgroundIdentifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs.MethodsThe primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures. Our approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests from Partners HealthCare. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approaches, including standard deviation and Mahalanobis distance.ResultsWe found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases.ConclusionOur contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm's job and initiate necessary actions that need to be taken in order to improve the quality of data.

Project description:Background: The initial injury burden from incident TBI is significantly amplified by recurrent TBI (rTBI). Unfortunately, research assessing the accuracy to conduct rTBI surveillance is not available. Accurate surveillance information on recurrent injuries is needed to justify the allocation of resources to rTBI prevention and to conduct high quality epidemiological research on interventions that mitigate this injury burden. This study evaluates the accuracy of administrative health data (AHD) surveillance case definitions for rTBI and estimates the 1-year rTBI incidence adjusted for measurement error. Methods: A 25% random sample of AHD for Montreal residents from 2000 to 2014 was used in this study. Four widely used TBI surveillance case definitions, based on the International Classification of Disease and on radiological exams of the head, were applied to ascertain suspected rTBI cases. Bayesian latent class models were used to estimate the accuracy of each case definition and the 1-year rTBI measurement-error-adjusted incidence without relying on a gold standard rTBI definition that does not exist, across children (<18 years), adults (18-64 years), and elderly (> =65 years). Results: The adjusted 1-year rTBI incidence was 4.48 (95% CrI 3.42, 6.20) per 100 person-years across all age groups, as opposed to a crude estimate of 8.03 (95% CrI 7.86, 8.21) per 100 person-years. Patients with higher severity index TBI had a significantly higher incidence of rTBI compared to patients with lower severity index TBI. The case definition that identified patients undergoing a radiological examination of the head in the context of any traumatic injury was the most sensitive across children [0.46 (95% CrI 0.33, 0.61)], adults [0.79 (95% CrI 0.64, 0.94)], and elderly [0.87 (95% CrI 0.78, 0.95)]. The most specific case definition was the discharge abstract database in children [0.99 (95% CrI 0.99, 1.00)], and emergency room visits claims in adults/elderly [0.99 (95% CrI 0.99, 0.99)]. Median time to rTBI was the shortest in adults (75 days) and the longest in children (120 days). Conclusion: Conducting accurate surveillance and valid epidemiological research for rTBI using AHD is feasible when measurement error is accounted for.

Project description:ObjectiveIn the absence of adequate nationally-representative empirical evidence on multimorbidity, the existing healthcare delivery system is not adequately oriented to cater to the growing needs of the older adult population. Therefore, the present study identifies frequently occurring multimorbidity patterns among older adults in India. Further, the study examines the linkages between the identified patterns and socioeconomic, demographic, lifestyle and anthropometric correlates.DesignThe present findings rest on a large nationally-representative sample from a cross-sectional study.Setting and participantsThe study used data on 58 975 older adults (45 years and older) from the Longitudinal Ageing Study in India, 2017-2018.Primary and secondary outcome measuresThe study incorporated a list of 16 non-communicable diseases to identify commonly occurring patterns using latent class analysis. The study employed multinomial logistic regression models to assess the association between identified disease patterns with unit-level socioeconomic, demographic, lifestyle and anthropometric characteristics.ResultsThe present study demonstrates that older adults in the country can be segmented into six patterns: 'relatively healthy', 'hypertension', 'gastrointestinal disorders-hypertension-musculoskeletal disorders', 'musculoskeletal disorders-hypertension-asthma', 'metabolic disorders' and 'complex cardiometabolic disorders'. Additionally, socioeconomic, demographic, lifestyle and anthropometric factors are significantly associated with one or more identified disease patterns.ConclusionsThe identified classes 'hypertension', 'metabolic disorders' and 'complex cardiometabolic disorders' reflect three stages of cardiometabolic morbidity with hypertension as the first and 'complex cardiometabolic disorders' as the last stage of disease progression. This underscores the need for effective prevention strategies for high-risk hypertension group. Also, targeted interventions are essential to reduce the burden on the high-risk population and provide equitable health services at the community level.

Dataset Information

In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm.

Background and objectives

Methods

Results

Conclusion

Publications

In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets