Unknown

Dataset Information

0

An integrated pipeline for prediction of Clostridioides difficile infection.


ABSTRACT: With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal bias, (2) performing simulation studies to determine the predictive power in samples without genetic information, (3) propensity score matching to control for the confoundings, (4) selecting machine learning algorithms to capture complex feature interactions, (5) performing oversampling to address data imbalance, and (6) optimizing models and ensuring proper bias-variance trade-off. We evaluate the performance of prediction models of CDI when including common clinical risk factors and the benefit of incorporating genetic feature(s) into the models. We emphasize the importance of building a robust integrated pipeline to avoid systemic bias and thoroughly evaluating genetic features when integrated into the prediction models in the general population and subgroups.

SUBMITTER: Li J 

PROVIDER: S-EPMC10545794 | biostudies-literature | 2023 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

An integrated pipeline for prediction of Clostridioides difficile infection.

Li Jiang J   Chaudhary Durgesh D   Sharma Vaibhav V   Sharma Vishakha V   Avula Venkatesh V   Ssentongo Paddy P   Wolk Donna M DM   Zand Ramin R   Abedi Vida V  

Scientific reports 20231002 1


With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal  ...[more]

Similar Datasets

| S-EPMC7475194 | biostudies-literature
| S-EPMC8447795 | biostudies-literature
| S-EPMC6917260 | biostudies-literature
| S-EPMC6714892 | biostudies-literature
2022-04-09 | GSE200346 | GEO
| S-EPMC9650352 | biostudies-literature
| S-EPMC10155633 | biostudies-literature
| S-EPMC8555850 | biostudies-literature
| S-EPMC7551610 | biostudies-literature
| S-EPMC7877971 | biostudies-literature