Dataset Information

Why are some languages confused for others? Investigating data from the Great Language Game.

ABSTRACT: In this paper we explore the results of a large-scale online game called 'the Great Language Game', in which people listen to an audio speech sample and make a forced-choice guess about the identity of the language from 2 or more alternatives. The data include 15 million guesses from 400 audio recordings of 78 languages. We investigate which languages are confused for which in the game, and if this correlates with the similarities that linguists identify between languages. This includes shared lexical items, similar sound inventories and established historical relationships. Our findings are, as expected, that players are more likely to confuse two languages that are objectively more similar. We also investigate factors that may affect players' ability to accurately select the target language, such as how many people speak the language, how often the language is mentioned in written materials and the economic power of the target language community. We see that non-linguistic factors affect players' ability to accurately identify the target. For example, languages with wider 'global reach' are more often identified correctly. This suggests that both linguistic and cultural knowledge influence the perception and recognition of languages and their similarity.

SUBMITTER: Skirgard H

PROVIDER: S-EPMC5381764 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Why are some languages confused for others? Investigating data from the Great Language Game.

Skirgård Hedvig H Roberts Seán G SG Yencken Lars L

PloS one 20170405 4

In this paper we explore the results of a large-scale online game called 'the Great Language Game', in which people listen to an audio speech sample and make a forced-choice guess about the identity of the language from 2 or more alternatives. The data include 15 million guesses from 400 audio recordings of 78 languages. We investigate which languages are confused for which in the game, and if this correlates with the similarities that linguists identify between languages. This includes shared l ...[more]

PMID: 28379970

Dataset Information

Why are some languages confused for others? Investigating data from the Great Language Game.

Publications

Why are some languages confused for others? Investigating data from the Great Language Game.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Enhancing African low-resource languages: Swahili data for language modelling.
| S-EPMC7339006 | biostudies-literature

Do some languages sound more beautiful than others?
| S-EPMC10151606 | biostudies-literature

The emergence of simple languages in an experimental coordination game.
| S-EPMC1863456 | biostudies-literature

Geography and language divergence: The case of Andic languages.
| S-EPMC9135239 | biostudies-literature

An investigation across 45 languages and 12 language families reveals a universal language network.
| S-EPMC10414179 | biostudies-literature

Language distance in orthographic transparency affects cross-language pattern similarity between native and non-native languages.
| S-EPMC7856648 | biostudies-literature

Mixing Languages during Learning? Testing the One Subject-One Language Rule.
| S-EPMC4479465 | biostudies-literature

Relating Natural Language Aptitude to Individual Differences in Learning Programming Languages.
| S-EPMC7051953 | biostudies-literature

Cross-language validation of COVID-19 Compliance Scale in 28 languages.
| S-EPMC10468815 | biostudies-literature

Learning Words and Definitions in Two Languages: What Promotes Cross-Language Transfer?
| S-EPMC6178972 | biostudies-literature