Dataset Information

Multimodal Transformer for Unaligned Multimodal Language Sequences.

ABSTRACT: Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

SUBMITTER: Tsai YH

PROVIDER: S-EPMC7195022 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multimodal Transformer for Unaligned Multimodal Language Sequences.

Tsai Yao-Hung Hubert YH Bai Shaojie S Pu Liang Paul P Kolter J Zico JZ Morency Louis-Philippe LP Salakhutdinov Ruslan R

Proceedings of the conference. Association for Computational Linguistics. Meeting 20190701

Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an en ...[more]

PMID: 32362720

Dataset Information

Multimodal Transformer for Unaligned Multimodal Language Sequences.

Publications

Multimodal Transformer for Unaligned Multimodal Language Sequences.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Discovering common stem-loop motifs in unaligned RNA sequences.
| S-EPMC55461 | biostudies-literature

An accurate method for identifying recent recombinants from unaligned sequences.
| S-EPMC8963311 | biostudies-literature

Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences.
| S-EPMC10231473 | biostudies-literature

Adding unaligned sequences into an existing alignment using MAFFT and LAST.
| S-EPMC3516148 | biostudies-literature

Extracting transcription factor binding sites from unaligned gene sequences with statistical models.
| S-EPMC2638147 | biostudies-literature

RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences.
| S-EPMC434454 | biostudies-literature

RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences.
| S-EPMC3167047 | biostudies-literature

Deep scaffold hopping with multimodal transformer neural networks.
| S-EPMC8590293 | biostudies-literature

A Computational Framework for Pattern Detection on Unaligned Sequences: An Application on SARS-CoV-2 Data.
| S-EPMC8194296 | biostudies-literature

Advanced multiple document summarization <i>via</i> iterative recursive transformer networks and multimodal transformer.
| S-EPMC11784779 | biostudies-literature