Dataset Information

Estimating the deep replicability of scientific findings using human and artificial intelligence.

ABSTRACT: Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study's replicability. Here, we trained an artificial intelligence model to estimate a paper's replicability using ground truth data on studies that had passed or failed manual replication tests, and then tested the model's generalizability on an extensive set of out-of-sample studies. The model predicts replicability better than the base rate of reviewers and comparably as well as prediction markets, the best present-day method for predicting replicability. In out-of-sample tests on manually replicated papers from diverse disciplines and methods, the model had strong accuracy levels of 0.65 to 0.78. Exploring the reasons behind the model's predictions, we found no evidence for bias based on topics, journals, disciplines, base rates of failure, persuasion words, or novelty words like "remarkable" or "unexpected." We did find that the model's accuracy is higher when trained on a paper's text rather than its reported statistics and that n-grams, higher order word combinations that humans have difficulty processing, correlate with replication. We discuss how combining human and machine intelligence can raise confidence in research, provide research self-assessment techniques, and create methods that are scalable and efficient enough to review the ever-growing numbers of publications-a task that entails extensive human resources to accomplish with prediction markets and manual replication alone.

SUBMITTER: Yang Y

PROVIDER: S-EPMC7245108 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Estimating the deep replicability of scientific findings using human and artificial intelligence.

Yang Yang Y Youyou Wu W Uzzi Brian B

Proceedings of the National Academy of Sciences of the United States of America 20200504 20

Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study's replicability. Here, we trained an artificial intelligence model to estimate a paper's replicability using ground truth data on studies that had passed or failed manual replication tests, ...[more]

PMID: 32366645

Similar Datasets

Project description:ObjectivesMap the current landscape of commercially available artificial intelligence (AI) software for radiology and review the availability of their scientific evidence.MethodsWe created an online overview of CE-marked AI software products for clinical radiology based on vendor-supplied product specifications ( www.aiforradiology.com ). Characteristics such as modality, subspeciality, main task, regulatory information, deployment, and pricing model were retrieved. We conducted an extensive literature search on the available scientific evidence of these products. Articles were classified according to a hierarchical model of efficacy.ResultsThe overview included 100 CE-marked AI products from 54 different vendors. For 64/100 products, there was no peer-reviewed evidence of its efficacy. We observed a large heterogeneity in deployment methods, pricing models, and regulatory classes. The evidence of the remaining 36/100 products comprised 237 papers that predominantly (65%) focused on diagnostic accuracy (efficacy level 2). From the 100 products, 18 had evidence that regarded level 3 or higher, validating the (potential) impact on diagnostic thinking, patient outcome, or costs. Half of the available evidence (116/237) were independent and not (co-)funded or (co-)authored by the vendor.ConclusionsEven though the commercial supply of AI software in radiology already holds 100 CE-marked products, we conclude that the sector is still in its infancy. For 64/100 products, peer-reviewed evidence on its efficacy is lacking. Only 18/100 AI products have demonstrated (potential) clinical impact.Key points• Artificial intelligence in radiology is still in its infancy even though already 100 CE-marked AI products are commercially available. • Only 36 out of 100 products have peer-reviewed evidence of which most studies demonstrate lower levels of efficacy. • There is a wide variety in deployment strategies, pricing models, and CE marking class of AI products for radiology.

Dataset Information

Estimating the deep replicability of scientific findings using human and artificial intelligence.

Publications

Estimating the deep replicability of scientific findings using human and artificial intelligence.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets