ChaLearn

Dataset description

Motivation

The UDIVA dataset aims to move beyond automatic individual behavior detection and focus on the development of automatic approaches to study and understand the mechanisms of influence, perception and adaptation to verbal and nonverbal social signals in dyadic interactions, taking into account individual and dyad characteristics as well as other contextual factors. To the best of our knowledge, there is no similar publicly available, non-acted face-to-face dyadic dataset in the research field in terms of number of views, participants, tasks, recorded sessions, and context labels.

UDIVA Statistics

The UDIVA dataset is composed of 90.5h of recordings of dyadic interactions between 147 voluntary participants (55.1% male) from 4 to 84 years old (mean=31.29), coming from 22 countries (68% from Spain). The majority of participants were students (38.8%), and identified themselves as white (84.4%). Participants were distributed into 188 dyadic sessions, with a participation average of 2.5 sessions/participant (max. 5 sessions). The most common interaction group is Male-Male/Young-Young/Unknown (15%), with 43% of the interactions happening among known people. Spanish is the majority language of interaction (71.8%), followed by Catalan (19.7%) and English (8.5%). Half of the sessions include both interlocutors with Spain as country of origin. Technically, the data was acquired using 6 HD tripod-mounted cameras (1280×720px, 25fps), 1 lapel microphone per participant and an omnidirectional microphone on the table. Each participant also wore an egocentric camera (1920×1080px, 30fps) around their neck and a heart rate monitor on their wrist. All the capturing devices are time-synchronized and the tripod-mounted cameras calibrated. Figure 1 illustrates the recording setup and the different views of UDIVA dataset. Figure 2 illustrates the different context (i.e., tasks) of UDIVA dataset.

Figure 1: Recording environment. We used six tripod-mounted cameras, namely GB: General Back camera, GF: General Frontal camera, HA: individual High Angle cameras and FC: individual Frontal Cameras, and two ego cameras E (one per participant, placed around their neck). a) Position of cameras, general microphone and participants. b) Example of the time-synchronized 8 views.

Figure 2: Examples of the 5 tasks included in the UDIVA dataset from 5 sessions. From left to right: Talk, Lego, Animals, Ghost, Gaze.

Data protection and ethics

The UDIVA dataset is currently stored in a secured server at the Computer Vision Center, Barcelona, Spain. Data collection and storage have been performed in compliance with General Data Protection Regulation (GDPR) and under ethical approval issued by the Universitat de Barcelona Bioethics department. As authors of the dataset, we confirm that we have permission to release it. A Dataset License will be attached to the large-scale dataset contents. Participants signed a consent form prior to the start of the recordings in which they granted their consent to share the data with the community for non-commercial purposes; therefore, only users belonging to academic or research organisations will be able to request access to the dataset. Users may only use the dataset after the Dataset License has been signed and returned to the dataset administrators. Users may not transfer, distribute or broadcast the dataset or portions thereof in any way. Users may use portions or the totality of the dataset provided they acknowledge such usage in their publications by citing the dataset release paper.

Dataset access

The complete dataset is currently being postprocessed for the final release. A subset of the data, which was originally used here for experimental evaluation and referred to as UDIVA v0.5, is already publicly available for research purposes only. For more details, please visit the UDIVA v0.5 webpage.

For more information about the dataset please contact us at: udiva at ub dot edu

News

There are no news registered in UDIVA