ChaLearn

Dataset description

[download the technical report]

Challenge data and
Final evaluation data

Data annotations
(Updated Jan 2013)

Translated, scaled, and occluded data
(Jan-Oct 2013)

Preprocessed data
(June 2012)

View examples

Data collection software + demo kit
(Nov 2012)

Sample code
(June 2012: new examples)

Sample submission

Color rendering of Kinect depth images

DOWNLOAD THE DATASET CGD2011 of 50,000 gestures from one of our data mirrors.

To cite the data please use:
"ChaLearn Gesture Dataset (CGD2011), ChaLearn, California, 2011"

Setting

We are portraying a single user in front of a fixed camera, interacting with a computer by performing gestures to

- play a game,

- remotely control appliances or robots, or

- learn to perform gestures from an educational software.

Kinect^TM data

Kinect^TM has revolutionized the field of gesture recognition because it is an affordable device (a high end webcam) providing both RGB and depth images. Depth images facilitate image segmentation considerably. We have collected a large dataset of 50,000 gestures with Kinect^TM. We provide Matlab^TM code to browse though the data and process it to create a sample submission. The data can also be viewed with most video viewers, see the README file for details.

We provide both the RGB image and the depth image as in the example below. View more examples.

RGB images

Example M_31 from valid04

Example K_31 from valid04Gray scale rendering of depth images

The data are organized in batches

devel01

...

devel480

[Initially only 20 development batches were released. All the data are now available]

valid01

...

valid20

final01 (final evaluation data for round 1)

...

final20

final21 (final evaluation data for round 2, not published yet)

...

final40

Each batch includes 100 recorded gestures grouped in sequences of 1 to 5 gestures performed by the same user. The gestures are drawn from a small vocabulary of 8 to 15 unique gestures, which we call a "lexicon" (see a few examples of the lexicons we used).

We selected lexicons from nine categories corresponding to various settings or application domains; they include (1) body language gestures (like scratching your head, crossing your arms), (2) gesticulations performed to accompany speech, (3) illustrators (like Italian gestures), (4) emblems (like Indian Mudras), (5) signs (from sign languages for the deaf), (6) signals (like referee signals, diving signals, or mashalling signals to guide machinery or vehicle), (7) actions (like drinking or writing), (8) pantomimes (gestures made to mimic actions), and (9) dance postures.

During the challenge, we do not disclose the identity of the lexicons and of the users. They will be revealed (after user anonymization) at the end of the challenge. Although the gesture classes are different from batch to batch, we represent the class label within each batch by a number between 1 and 15.

Goal of the challenge: one-shot-learning

For the develXX batches, we provide all the labels. For the validXX and finalXX batches, we provide labels only for one examle of each class. The goal is to predict the gesture class labels for the remaining gesture sequences.

During the development period, performance feed-back will be provided on-line on the validXX batches. The final evaluation will be carried out on the finalXX batches and the final results will be revealed only when the challenge is over.

What is easy about the data:

- Fixed camera

- Availability of depth data

- Single user within a batch

- Homogeneous recording conditions within a batch

- Small vocabulary within a batch

- Gestures separated by returning to a resting position

- Gestures performed mostly by arms and hands

- Camera framing mostly the upper body (some exceptions)

What is hard about the data:

- Only one labeled example of each unique gestures

- Variations in recording conditions (various backgrounds, clothing, skin colors, lighting, temperature, resolution)

- Some parts of the body may be occluded

- Some users are less skilled than others

- Some users made errors or omissions in performing the gestures

Data annotations:

We provide some data annotations, including temporal segmentation into isolated gestures and body part annotations (head, shoulders, elbows, and hands).

News

There are no news registered in CGD 2011 Data