Unknown

Dataset Information

0

CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.


ABSTRACT: Modeling multimodal language is a core research area in natural language processing. While languages such as English have relatively large multimodal language resources, other widely spoken languages across the globe have few or no large-scale datasets in this area. This disproportionately affects native speakers of languages other than English. As a step towards building more equitable and inclusive multimodal systems, we introduce the first large-scale multimodal language dataset for Spanish, Portuguese, German and French. The proposed dataset, called CMU-MOSEAS (CMU Multimodal Opinion Sentiment, Emotions and Attributes), is the largest of its kind with 40, 000 total labelled sentences. It covers a diverse set topics and speakers, and carries supervision of 20 labels including sentiment (and subjectivity), emotions, and attributes. Our evaluations on a state-of-the-art multimodal model demonstrates that CMU-MOSEAS enables further research for multilingual studies in multimodal language.

SUBMITTER: Zadeh A 

PROVIDER: S-EPMC8106386 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.

Zadeh Amir A   Cao Yan Sheng YS   Hessner Simon S   Liang Paul Pu PP   Poria Soujanya S   Morency Louis-Philippe LP  

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing 20201101


Modeling multimodal language is a core research area in natural language processing. While languages such as English have relatively large multimodal language resources, other widely spoken languages across the globe have few or no large-scale datasets in this area. This disproportionately affects native speakers of languages other than English. As a step towards building more equitable and inclusive multimodal systems, we introduce the first large-scale multimodal language dataset for Spanish,  ...[more]

Similar Datasets

| S-EPMC2757423 | biostudies-other
| S-EPMC9525723 | biostudies-literature
| S-EPMC6472396 | biostudies-literature
| S-EPMC10501622 | biostudies-literature
| S-EPMC8486854 | biostudies-literature
| S-EPMC4779959 | biostudies-literature
| S-EPMC10796967 | biostudies-literature
| S-EPMC9536312 | biostudies-literature
| S-EPMC7195022 | biostudies-literature
| S-EPMC10575554 | biostudies-literature