Recognition performance will be evaluated using the Jaccard Index (overlap). In this sense, for each one of the 20 gesture categories labeled for each RGBD sequences, the Jaccard Index is defined as follows:
Where As,n is the ground truth of action/gesture n at sequence s, and Bs,n is the prediction for such action/gesture at sequence s. For both Tracks 2 and 3, As,n and Bs,n are binary vectors where 1-value entries denote frames in which the n-th action/gesture is being performed.
In the case of false positives (e.g predicting a action that is not on the ground truth), the Jaccard Index will be automatically 0 for that action prediction and that action class will count in the mean Jaccard Index computation. In other words n equals the intersection of action categories appearing in the ground truth and in the predictions.
Participants will be evaluated upon mean Jaccard Index among all actions/gestures categories for all sequences, where all action/gesture categories are independent but not mutually exclusive (in a certain frame more than one action/gesture class can be active). In addition, when computing the mean Jaccard Index all gesture categories will have the same importance. Finally, the participant with the highest mean Jaccard Index will be the winner. An example of the calculation for a single sequence and two action/gesture categories is show in Figure 2.
Figure 2: Example of mean Jaccard Index calculation for different instances of gestures/actions categories in a sequence (single red lines denote ground truth annotations and double red lines denote predictions). In the top part of the image we see the ground truth annotations for actions/gestures walk and fight at a sequence s. In the center part of the image a prediction is evaluated obtaining a Jaccard Index of 0.72. In the bottom part of the image the same procedure is performed with the action fight and the obtained Jaccard Index is 0.46. Finally, the mean Jaccard Index is computed obtaining a value of 0.59.
Participants must submit their predictions in the following format. For each sequence SequenceXXXX.zip in the data folder, participants should create a SequenceXXXX_prediction.csv file with a line for each predicted gesture [GestureID, StartFrame, EndFrame] (the same format as SequenceXXXX_labels.csv). The predictions for all the samples will be put in a single ZIP file and submitted to Codalab.