Unknown

Dataset Information

0

Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning.


ABSTRACT: Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale.

SUBMITTER: Benedum CM 

PROVIDER: S-EPMC10046618 | biostudies-literature | 2023 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning.

Benedum Corey M CM   Sondhi Arjun A   Fidyk Erin E   Cohen Aaron B AB   Nemeth Sheila S   Adamson Blythe B   Estévez Melissa M   Bozkurt Selen S  

Cancers 20230320 6


Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., <i>ML extraction</i>) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived  ...[more]

Similar Datasets

| S-EPMC9298266 | biostudies-literature
| S-EPMC9032917 | biostudies-literature
| S-EPMC10979204 | biostudies-literature
| S-EPMC4261015 | biostudies-literature
| S-EPMC11835724 | biostudies-literature
| S-EPMC10514685 | biostudies-literature
| S-EPMC10436147 | biostudies-literature
| S-EPMC10582952 | biostudies-literature
| S-EPMC10524844 | biostudies-literature
| S-EPMC7338229 | biostudies-literature