For the evaluation process, the output from your system should be a real-valued confidence so that a precision/recall curve can be drawn. Greater confidence values signify greater confidence that the image contains belongs to the class of interest. The classification task will be judged by the precision/recall curve. The principal quantitative measure used will be the average precision (AP). We use the area under this curve to the computation of the average precision (AP), which is calculated by numerical integration as follows:
- First, we compute a version of the precision/recall curve with precision monotonically decreasing. It is obtained by setting the precision for recall r to the maximum precision obtained for any recall r' ≥ r.
- Then, we compute the AP as the area under this curve by numerical integration. For this, we use the well know trapezoidal rule. Let f(x) the function that represents our precision/recall curve, the trapezoidal rule works by approximating the region under this curve as follows:
In this track, the participants should submit a text file for each category. Each line in the text file should contain a single identifier and the confidence output by the classifier, separated by a space. The predictions TXT files for each category (100 in total) will be put in a single ZIP file and submitted to Codalab.
For example, to the category `La Tomatina', the output of your method should be a file (La_Tomatina.txt) containing the identifiers and confidences for each image:
002234.jpg 0.056313 010256.jpg 0.127031 010987.jpg 0.287153 ...
For this track, we provide a file (evaluate.py) which contains the method for evaluation. This script allows to get the final mean average precision value. In order to ensure the proper operation of the script we need to create the following directory structure:
Track2/ % main directory
Track2/program/ % this folder should contain this script
Track2/input/ref/ % this folder contains the ground truth for validation
Track2/input/res/ % this folder contains the results of the participants for each category (see the format in the previous section)
Track2/output/ % contains the output of the script (a file with the final final mean average precision)
To obtain the final mean average precision (mAP) for all the categories, in the same way performed in the Codalab platform, we must go to the main directory and run the next command:
python program/evaluate.py input output