Unknown

Dataset Information

0

Pathway information extracted from 25 years of pathway figures.


ABSTRACT: Thousands of pathway diagrams are published each year as static figures inaccessible to computational queries and analyses. Using a combination of machine learning, optical character recognition, and manual curation, we identified 64,643 pathway figures published between 1995 and 2019 and extracted 1,112,551 instances of human genes, comprising 13,464 unique NCBI genes, participating in a wide variety of biological processes. This collection represents an order of magnitude more genes than found in the text of the same papers, and thousands of genes missing from other pathway databases, thus presenting new opportunities for discovery and research.

SUBMITTER: Hanspers K 

PROVIDER: S-EPMC7649569 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Pathway information extracted from 25 years of pathway figures.

Hanspers Kristina K   Riutta Anders A   Summer-Kutmon Martina M   Pico Alexander R AR  

Genome biology 20201109 1


Thousands of pathway diagrams are published each year as static figures inaccessible to computational queries and analyses. Using a combination of machine learning, optical character recognition, and manual curation, we identified 64,643 pathway figures published between 1995 and 2019 and extracted 1,112,551 instances of human genes, comprising 13,464 unique NCBI genes, participating in a wide variety of biological processes. This collection represents an order of magnitude more genes than found  ...[more]

Similar Datasets

| S-EPMC4383898 | biostudies-literature
| S-EPMC6672050 | biostudies-literature
| S-EPMC10676589 | biostudies-literature
| S-EPMC9691658 | biostudies-literature
| S-EPMC5381451 | biostudies-literature
2018-08-31 | GSE114006 | GEO
2002-06-27 | GSE50 | GEO
2002-06-27 | E-GEOD-50 | biostudies-arrayexpress
| S-EPMC4279172 | biostudies-literature
| S-EPMC3073220 | biostudies-literature