Unknown

Dataset Information

0

Diffusion of lexical change in social media.


ABSTRACT: Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity - especially with regard to race - plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified "netspeak" dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.

SUBMITTER: Eisenstein J 

PROVIDER: S-EPMC4237389 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Diffusion of lexical change in social media.

Eisenstein Jacob J   O'Connor Brendan B   Smith Noah A NA   Xing Eric P EP  

PloS one 20141119 11


Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, an  ...[more]

Similar Datasets

| S-EPMC5843796 | biostudies-literature
| 2332186 | ecrin-mdr-crc
| S-EPMC8608927 | biostudies-literature
| S-EPMC9024235 | biostudies-literature
| S-EPMC6454597 | biostudies-literature
| S-EPMC8595103 | biostudies-literature
| S-EPMC5945602 | biostudies-literature
| S-EPMC8848992 | biostudies-literature
| 2283132 | ecrin-mdr-crc
| S-EPMC5383801 | biostudies-literature