Dataset Information

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams.

ABSTRACT: We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human-computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition. SpeakingFaces is comprised of aligned high-resolution thermal and visual spectra image streams of fully-framed faces synchronized with audio recordings of each subject speaking approximately 100 imperative phrases. Data were collected from 142 subjects, yielding over 13,000 instances of synchronized data (∼3.8 TB). For technical validation, we demonstrate two baseline examples. The first baseline shows classification by gender, utilizing different combinations of the three data streams in both clean and noisy environments. The second example consists of thermal-to-visual facial image translation, as an instance of domain transfer.

SUBMITTER: Abdrakhmanova M

PROVIDER: S-EPMC8156799 | biostudies-literature | 2021 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams.

Abdrakhmanova Madina M Kuzdeuov Askat A Jarju Sheikh S Khassanov Yerbolat Y Lewis Michael M Varol Huseyin Atakan HA

Sensors (Basel, Switzerland) 20210516 10

We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human-computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition. SpeakingFaces is comprised of aligned high-resolution thermal and visual spectra image streams of fully-framed faces synchronized with audio recordings of ea ...[more]

PMID: 34065700

Dataset Information

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams.

Publications

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Development of a large-scale medical visual question-answering dataset.
| S-EPMC11663219 | biostudies-literature

A large-scale fMRI dataset for the visual processing of naturalistic scenes.
| S-EPMC10447576 | biostudies-literature

A visual analog scale for patient-reported voice outcomes: The VAS voice.
| S-EPMC7042645 | biostudies-literature

Visual influence on path integration in darkness indicates a multimodal representation of large-scale space.
| S-EPMC3024704 | biostudies-other

Multimodal video and IMU kinematic dataset on daily life activities using affordable devices.
| S-EPMC10516922 | biostudies-literature

MoVi: A large multi-purpose human motion and video dataset.
| S-EPMC8211257 | biostudies-literature

EdNet: A Large-Scale Hierarchical Dataset in Education
| S-EPMC7334672 | biostudies-literature

Large-scale analysis of the human transcriptome - Novartis dataset
2002-12-06 | E-GEOD-96 | biostudies-arrayexpress

Tectal microcircuit generating visual selection commands on gaze-controlling neurons.
| S-EPMC4403191 | biostudies-literature

Thermal dependence of large-scale freckle defect formation.
| S-EPMC6460059 | biostudies-literature