Unknown

Dataset Information

0

Chia, a large annotated corpus of clinical trial eligibility criteria.


ABSTRACT: We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.

SUBMITTER: Kury F 

PROVIDER: S-EPMC7452886 | biostudies-literature | 2020 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Chia, a large annotated corpus of clinical trial eligibility criteria.

Kury Fabrício F   Butler Alex A   Yuan Chi C   Fu Li-Heng LH   Sun Yingcheng Y   Liu Hao H   Sim Ida I   Carini Simona S   Weng Chunhua C  

Scientific data 20200827 1


We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serv  ...[more]

Similar Datasets

| S-EPMC5001741 | biostudies-literature
| S-EPMC8373041 | biostudies-literature
| S-EPMC6259668 | biostudies-literature
| S-EPMC3852288 | biostudies-literature
| S-EPMC8054032 | biostudies-literature
| S-EPMC4119097 | biostudies-literature
| S-EPMC8761732 | biostudies-literature
| S-EPMC6716811 | biostudies-literature