For this competition track, we ask the participants to perform human detection in the depth modality. Depth cameras are cost-effective devices that provide geometric information of the scene at a resolution and frame acquisition speed that is comparable to RGB cameras. The downside is their noisiness at large real distances.
Given the provided depth frames (and bounding box groundtruth annotations), the participants will be asked to develop their depth-based human detection method. The method will need to output a list of bounding boxes (along with associated confidence scores) per frame containing each person in it. The performance the image-based human detection methods will be evaluated in terms of average precision.