Unknown

Dataset Information

0

A cross-lingual similarity measure for detecting biomedical term translations.


ABSTRACT: Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)--a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English-French, English-Spanish, English-Greek, and English-Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks.

SUBMITTER: Bollegala D 

PROVIDER: S-EPMC4452086 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC4595445 | biostudies-literature
| S-EPMC4626085 | biostudies-literature
| S-EPMC2518162 | biostudies-literature
| S-EPMC4983430 | biostudies-literature
2023-01-05 | PXD027791 | Pride
| S-EPMC4450834 | biostudies-other
| S-EPMC8056768 | biostudies-literature
| S-EPMC3461574 | biostudies-literature
| S-EPMC9913042 | biostudies-literature
| S-EPMC4907436 | biostudies-literature