Unknown

Dataset Information

0

A machine-compiled database of genome-wide association studies.


ABSTRACT: Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estimated precision of 78-94% (measured relative to existing manually curated knowledge bases). This system represents a fully automated GWAS curation effort and is made possible by a paradigm for constructing machine learning systems called data programming. Our work represents a step towards making the curation of scientific literature more efficient using automated systems.

SUBMITTER: Kuleshov V 

PROVIDER: S-EPMC6659642 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

A machine-compiled database of genome-wide association studies.

Kuleshov Volodymyr V   Ding Jialin J   Vo Christopher C   Hancock Braden B   Ratner Alexander A   Li Yang Y   Ré Christopher C   Batzoglou Serafim S   Snyder Michael M  

Nature communications 20190726 1


Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estima  ...[more]

Similar Datasets

| S-EPMC3198468 | biostudies-literature
| S-EPMC3245026 | biostudies-literature
| S-EPMC5007749 | biostudies-other
| S-EPMC3373190 | biostudies-literature
| S-EPMC3172934 | biostudies-literature
| S-EPMC3794570 | biostudies-literature
| S-EPMC7900884 | biostudies-literature
| S-EPMC3579732 | biostudies-literature
| S-EPMC3625632 | biostudies-literature
| S-EPMC3446262 | biostudies-literature