Evaluation metrics


Levenshtein Distance 


For each video, participants should provide an ordered list of labels R corresponding to the recognized gestures, one label only per recognized gesture. We will compare this list to the corresponding list of labels T in the prescribed list of gestures that the user had play. These are the "true" gesture labels. 

For evaluation, we consider the so-called Levenshtein distance L(R, T), that is the minimum number of edit operations (substitution, insertion, or deletion) that one has to perform to go from R to T (or vice versa). The Levenhstein distance is also known as "edit distance". 

For example: 

L([1 2 4], [3 2]) = 2 

L([1], [2]) = 1 

L([2 2 2], [2]) = 2 
 

Score

The overall score we compute is the sum of the Levenshtein distances for all the lines of the result file compared to the corresponding lines in the truth value file, divided by the total number of gestures in the truth value file. This score is analogous to an error rate. However, it can exceed one. 

Public score means the score that appears on the leaderboard during the development period and is based on the validation data. 

Final score means the score that will be computed on the final evaluation data released at the end of the development period, which will not be revealed until the challenge is over. The final score will be used to rank the participants and determine the prizes. 

Verification procedure

To verify that the participants complied with the rule that there should be no manual labeling of the test data, the top ranking participants eligible to win prizes will be asked to cooperate with the organizers to reproduce their results. 

During the development period the participants can upload executable code reproducing their results together with their submissions. The organizers will evaluate requests to support particular platforms, but do not commit to support all platforms. The sooner a version of the code is uploaded, the highest the chances that the organizers will succeed in running it on their platform. The burden of proof will rest on the participants. 

The code will be kept in confidence and used only for verification purpose after the challenge is over. The code submitted will need to be standalone and in particular it will not be allowed to access the Internet. It will need to be capable of training models from the final evaluation data training examples, for each data batch, and making label predictions on the test examples of that batch.

News


There are no news registered in