Dataset description

We split the recorded data (which be can downloaded from here) into: 

  • Development data: fully labeled data that can be used for training and validation as desired. 
  • Validation data: a dataset formatted in a similar way as the final evaluation data that can be used to practice making submissions on the Kaggle platform. The results on validation data will show immediately as the "public score" on the leaderboard. The validation data is slightly easier than the development data. 
  • Final evaluation data: the dataset that will be used to compute the final score (will be released shortly before the end of the challenge). 


Provided files
We provide the individual X_audio.ogg, X_audio.wav, X_color.mp4, X_depth.mp4, and X_user.mp4 files containing the audio, RGB, depth, and user mask videos for a given sequence X. All the sequences are recorded at 20 FPS.
We also provide a script interface, dataViewer, in order to visualize the samples and export the data to Matlab. If dataViewer is used, do not unzip the sample data. Just type at the Matlab prompt:

> dataViewer;


DataViewer GUI


The implemented options are:

1.- Load the sample data with the Load Data button to load one of the zip files [be patient, it takes some time to load].

2.- Visualize all the multimodal data stored in the sample.

3.- Play the audio file stored in the sample.

4.- Export the multimodal data in the following four Matlab structures:

  • NumFrames: Total number of frames.
  • FrameRate: Frame rate of a video.
  • Audio: structure that contains audio data.
    • y: audio data
    • fs: sample rate for the data
  • Labels: structure that contains the data about labels.
    • Name: The name given to this gesture.
    • Begin: It indicates the start frame of the gesture.
    • End: It indicates the ending frame of the gesture.
    • Label names are as follows:
  1. ‘vattene’
  2. 'vieniqui'
  3. 'perfetto’
  4. 'furbo'
  5. 'cheduepalle'
  6. 'chevuoi'
  7. 'daccordo'
  8. 'seipazzo'
  9. 'combinato'
  10. 'freganiente'
  11. 'ok'
  12. 'cosatifarei'
  13. 'basta'
  14. 'prendere'
  15. 'noncenepiu'
  16. 'fame'
  17. 'tantotempo'
  18. 'buonissimo'
  19. 'messidaccordo'
  20. 'sonostufo'



















After exportation, an individual mat (Sample00001_X.mat, where X indicates the number of the frame) file for each frame is generated containing the following structures: 

  • RGB.
  • Depth.
  • UserIndex
  • Skeleton.


There are no news registered in Multimodal Gesture Recognition: Montalbano V1 (ICMI '13)