Unknown

Dataset Information

0

Multimodal Transformer for Unaligned Multimodal Language Sequences.


ABSTRACT: Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

SUBMITTER: Tsai YH 

PROVIDER: S-EPMC7195022 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Multimodal Transformer for Unaligned Multimodal Language Sequences.

Tsai Yao-Hung Hubert YH   Bai Shaojie S   Pu Liang Paul P   Kolter J Zico JZ   Morency Louis-Philippe LP   Salakhutdinov Ruslan R  

Proceedings of the conference. Association for Computational Linguistics. Meeting 20190701


Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an en  ...[more]

Similar Datasets

| S-EPMC55461 | biostudies-literature
| S-EPMC8963311 | biostudies-literature
| S-EPMC3516148 | biostudies-literature
| S-EPMC10231473 | biostudies-literature
| S-EPMC8590293 | biostudies-literature
| S-EPMC2638147 | biostudies-literature
| S-EPMC434454 | biostudies-literature
| S-EPMC3167047 | biostudies-literature
| S-EPMC10726525 | biostudies-literature
| S-EPMC8106385 | biostudies-literature