Unknown

Dataset Information

0

Developing an automated mechanism to identify medical articles from wikipedia for knowledge extraction.


ABSTRACT: Wikipedia contains rich biomedical information that can support medical informatics studies and applications. Identifying the subset of medical articles of Wikipedia has many benefits, such as facilitating medical knowledge extraction, serving as a corpus for language modeling, or simply making the size of data easy to work with. However, due to the extremely low prevalence of medical articles in the entire Wikipedia, articles identified by generic text classifiers would be bloated by irrelevant pages. To control the false discovery rate while maintaining a high recall, we developed a mechanism that leverages the rich page elements and the connected nature of Wikipedia and uses a crawling classification strategy to achieve accurate classification. Structured assertional knowledge in Infoboxes and Wikidata items associated with the identified medical articles were also extracted. This automatic mechanism is aimed to run periodically to update the results and share them with the informatics community.

SUBMITTER: Yu L 

PROVIDER: S-EPMC7357526 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Developing an automated mechanism to identify medical articles from wikipedia for knowledge extraction.

Yu Lishan L   Yu Sheng S  

International journal of medical informatics 20200713


Wikipedia contains rich biomedical information that can support medical informatics studies and applications. Identifying the subset of medical articles of Wikipedia has many benefits, such as facilitating medical knowledge extraction, serving as a corpus for language modeling, or simply making the size of data easy to work with. However, due to the extremely low prevalence of medical articles in the entire Wikipedia, articles identified by generic text classifiers would be bloated by irrelevant  ...[more]

Similar Datasets

| S-EPMC5749832 | biostudies-literature
| S-EPMC5468769 | biostudies-literature
| S-EPMC3789750 | biostudies-literature
| S-EPMC5968213 | biostudies-literature
| S-EPMC8921307 | biostudies-literature
| S-EPMC7797509 | biostudies-literature
| S-EPMC7089765 | biostudies-literature
| S-EPMC6245851 | biostudies-other
| S-EPMC7334757 | biostudies-literature
| S-EPMC3861927 | biostudies-literature