Unknown

Dataset Information

0

The Potential of Automatic Word Comparison for Historical Linguistics.


ABSTRACT: The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.

SUBMITTER: List JM 

PROVIDER: S-EPMC5271327 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

The Potential of Automatic Word Comparison for Historical Linguistics.

List Johann-Mattis JM   Greenhill Simon J SJ   Gray Russell D RD  

PloS one 20170127 1


The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the  ...[more]

Similar Datasets

| PRJEB52191 | ENA
| S-EPMC5877971 | biostudies-literature
| S-EPMC10868791 | biostudies-literature
| S-EPMC5225401 | biostudies-literature
| S-EPMC10091714 | biostudies-literature
| S-EPMC9282454 | biostudies-literature
| S-EPMC8048438 | biostudies-literature
| S-EPMC4080745 | biostudies-literature
2021-05-05 | GSE160737 | GEO
| S-EPMC7312150 | biostudies-literature