Unknown

Dataset Information

0

Aggregating large-scale databases for PubMed author name disambiguation.


ABSTRACT:

Objective

PubMed has suffered from the author ambiguity problem for many years. Existing studies on author name disambiguation (AND) for PubMed only used internal metadata for development. However, some of them are incomplete (eg, a large number of names are only abbreviated and their full names are not available) or less discriminative. To this end, we present a new disambiguation method, namely AggAND, by aggregating information from external databases.

Materials and methods

We address this issue by exploring Microsoft Academic Graph, Semantic Scholar, and PubMed Knowledge Graph to enhance the built-in name metadata, and extend the internal metadata with some external and more discriminative metadata.

Results

Experimental results on enhanced name metadata demonstrate comparable performance to 3 author identifier systems, as well as show superiority over the original name metadata. More importantly, our method, AggAND, incorporating both enhanced name and extended metadata, yields F1 scores of 95.80% and 93.71% on 2 datasets and outperforms the state-of-the-art method by a large margin (3.61% and 6.55%, respectively).

Conclusions

The feasibility and good performance of our methods not only help better understand the importance of external databases for disambiguation, but also point to a promising direction for future AND studies in which information aggregated from multiple bibliographic databases can be effective in improving disambiguation performance. The methodology shown here can be generalized to broader bibliographic databases beyond PubMed. Our code and data are available online (https://github.com/carmanzhang/PubMed-AND-method).

SUBMITTER: Zhang L 

PROVIDER: S-EPMC8363810 | biostudies-literature | 2021 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Aggregating large-scale databases for PubMed author name disambiguation.

Zhang Li L   Huang Yong Y   Yang Jinqing J   Lu Wei W  

Journal of the American Medical Informatics Association : JAMIA 20210801 9


<h4>Objective</h4>PubMed has suffered from the author ambiguity problem for many years. Existing studies on author name disambiguation (AND) for PubMed only used internal metadata for development. However, some of them are incomplete (eg, a large number of names are only abbreviated and their full names are not available) or less discriminative. To this end, we present a new disambiguation method, namely AggAND, by aggregating information from external databases.<h4>Materials and methods</h4>We  ...[more]

Similar Datasets

| S-EPMC8359369 | biostudies-literature
| S-EPMC10557506 | biostudies-literature
| S-EPMC4930168 | biostudies-literature
| S-EPMC5438420 | biostudies-literature
| S-EPMC9652778 | biostudies-literature
| S-EPMC3499436 | biostudies-literature
| S-EPMC6110253 | biostudies-literature
| S-EPMC3463205 | biostudies-literature
| S-EPMC10687183 | biostudies-literature
| S-EPMC4227259 | biostudies-literature