Human Pose (ECCV '14)
The focus of this track is on mult-limb, user independent pose recovery, which means learning to recognize limbs from several instances for each limb class belonging to different actors. For this particular track, more than 8000 images were labelled at pixel precisions with 14 limbs (more than 120000 human limbs were manually labelled). Users appear portraying different poses and interacting with secondary actors in the same scene. In all cases all actors taking part in the scene are manually labelled with the 14 different limbs, if they are visible.
The dataset is composed by 9 RGB sequences, containing more than 8000 frames and more than 120000 manually labelled limbs in total. For each frame we provide the RGB image and 14 binary masks corresponding to each one of the limbs. For each binary mask, 1-valued pixels indicate the region in which the limb is contained.
The data is organized as a set of sequences, each one unically identified by an string SeqXX, where XX is a 2 integer digit number. Each sequence is provided as a single ZIP file named with its identifier (eg. SeqXX.zip).
Each sample ZIP file contains the following files:
- /imagesjpg: Set of RGB images composing the sequence. Each rgb file name denotes the sequence and number of frame of the image (XX_YYYY.jpg denotes the YYYY frame at the XX sequence).
- /maskspng: For each RGB image in the /imagesjpg folder we define 14 binary masks which denote the region in which a certain limb is positioned. Each binary mask file name follows the pattern XX_YYYY_W_Z.png, where XX denotes the sequence, YYYY denotes the frame, W denotes the actor in the sequence (1 if its at the left part of the image, 2 if its at the right part) and Z denotes the limb number (following the ordering defined in the figure above).
It will be used the Jaccard Index (overlapping). Thus, for each one of the n≤14 limbs labeled for each subject on each frame i, the Jaccard Index is defined as follows:
where Ai,n is the ground truth of limb n, and Bi,n is the prediction for the same limb (at image i). For the dataset in this challenge both Ai,n and Bi,n are binary images where ‘1’ pixels denote the region in which the n-th limb is predicted. Particularly, since Ai,n (ground truth) is a binary image and 1-value pixels indicate the region of the n-th limb, this positive region does not necessarily need to be square. However, in all cases the positive region is a polyhedron defined by four points. Thus, numerator is the number ‘1’ pixels that intersects in both images Ai,n and Bi,n, and denominator is the number of union ‘1’ pixels after applying local or operator.
In the case of false positives (e.g predicting a limb that is not on the ground truth because of being occluded), the prediction will not affect the mean Hit Rate calculation. In other words n is computed as the intersection of the limb categories in the ground truth and the predictions.
Participant methods will be evaluated upon hit rate (HR) detection of limbs. That is, for each limb n at each image i a hit will be computed if Ji,n≥0.5. Then, the mean hit rate among all limbs for all images will be computed (where all limb detections will have the same weight) and the participant with the highest mean hit rate will be the winner.
In some images a limb may not labeled in the ground truth because of occlusions. In that case where n<14, participants must not provide with any prediction of that particular limb. An example of the mean hit rate calculation for an example of n=3 limbs and i=1 image is show in Figure 1.
Figure 1: Mean hit rate and Jaccard Index calculation for a sample with n=3 limbs and i=1 image. In the top part of the image the Jaccard Index for the head limb is computed, as it is greater than 0.5 then it computes as a hit for the image i and the head limb. Similarly, for the torso limb the Jaccard Index obtained is 0.72 (center part of the image) which also computes as a hit for torso limb. In addition, in the bottom of the image the Jaccard Index obtained for the left thigh limb is shown, which does not compute as a hit since 0.04<0.5. Finally, the mean hit rate is obtained for those three limbs.