The UDIVA dataset aims to move beyond automatic individual behavior detection and focus on the development of automatic approaches to study and understand the mechanisms of influence, perception and adaptation to verbal and nonverbal social signals in dyadic interactions, taking into account individual and dyad characteristics as well as other contextual factors. To the best of our knowledge, there is no similar publicly available, non-acted face-to-face dyadic dataset in the research field in terms of number of views, participants, tasks, recorded sessions, and context labels.
The UDIVA dataset is composed of 90.5h of recordings of dyadic interactions between 147 voluntary participants (55.1% male) from 4 to 84 years old (mean=31.29), coming from 22 countries (68% from Spain). The majority of participants were students (38.8%), and identified themselves as white (84.4%). Participants were distributed into 188 dyadic sessions, with a participation average of 2.5 sessions/participant (max. 5 sessions). The most common interaction group is Male-Male/Young-Young/Unknown (15%), with 43% of the interactions happening among known people. Spanish is the majority language of interaction (71.8%), followed by Catalan (19.7%) and English (8.5%). Half of the sessions include both interlocutors with Spain as country of origin. Technically, the data was acquired using 6 HD tripod-mounted cameras (1280×720px, 25fps), 1 lapel microphone per participant and an omnidirectional microphone on the table. Each participant also wore an egocentric camera (1920×1080px, 30fps) around their neck and a heart rate monitor on their wrist. All the capturing devices are time-synchronized and the tripod-mounted cameras calibrated. Figure 1 illustrates the recording setup and the different views of UDIVA dataset. Figure 2 illustrates the different context (i.e., tasks) of UDIVA dataset.