Unknown

Dataset Information

0

New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial.


ABSTRACT: Objectives:To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. Materials and Methods:We updated our previous model by creating larger, more recent, and more diverse positive and negative training sets consisting of article pairs that were (or not) linked to the same ClinicalTrials.gov trial registry number. Features were extracted from PubMed metadata; pairwise similarity scores were modeled using logistic regression and used to form clusters of articles that are likely to arise from the same registered clinical trial. Results:Articles from the same trial were identified with high accuracy (F1?=?0.859), nominally better than the previous model (F1?=?0.843). Predicted clusters showed a low error rate of splitting of 8-11% (ie, when 2 articles belonged to the same trial but were assigned to different clusters). Performance was similar whether only randomized controlled trial articles or a more diverse set of clinical trial articles were processed. Discussion:Metadata are surprisingly accurate in predicting when 2 articles derive from the same underlying clinical trial. Conclusion:We have continued confidence in the Aggregator tool which can be accessed publicly at http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

SUBMITTER: Smalheiser NR 

PROVIDER: S-EPMC7660960 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial.

Smalheiser Neil R NR   Holt Arthur W AW  

JAMIA open 20201028 3


<h4>Objectives</h4>To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.<h4>Materials and methods</h4>We updated our previous model by creating larger, more recent, and more diverse positive and negative training sets consisting of article pairs that were (or not) linked to the same ClinicalTrials.gov trial registry number. Features were extracted from PubMed metadata; pairwise simi  ...[more]

Similar Datasets

| S-EPMC4339517 | biostudies-literature
| S-EPMC8182934 | biostudies-literature
| PRJEB20349 | ENA
2005-06-01 | GSE1561 | GEO
| S-EPMC8653301 | biostudies-literature
| S-EPMC5462434 | biostudies-literature
| S-EPMC53822 | biostudies-other
2021-09-30 | GSE180295 | GEO
2019-02-01 | GSE125966 | GEO
| S-EPMC3375310 | biostudies-literature