Track description

Input to the model

In this track participants build their models based on RGBA input data. We also provide SMPL root joint distance to camera in the validation/test sets to avoid scale ambiguity. Participants can also make use of a rich metadata during training.

RGB+Alpha data is compressed in a .mkv video for each sequence and participants are free to use temporal data to build their models. You can follow the provided instructions in the starting kit to uncompress videos.



The output of the model is 3D reconstructed garments per frame. Note that ground truth garments are relative to SMPL root joint. Participants are free to predict 3D garments as a mesh, point cloud or volumetric data. However, evaluation is done based on mesh format. Therefore in the case of a point cloud or volumetric data, participants must apply an additional post-processing to convert predicted garments to the right format.


Providing a submission

A submission is a zip file with the following structure:

  • <sequence 1>
    • <garment 1>.bin       <======== a file to store face/topology data, i.e. vertex indices of each triangulated face
    • <garment 1>.pc16    <======== a file to store vertex locations for the whole sequence
    • <garment 2>.bin
    • <garment 2>.pc16
  • <sequence 2>
    • ....

where "sequence i" must have the same name as the validation/test data and "garment j" name is the garment type from the set {Top, Tshirt, Trousers, Skirt, Jumpsuit, Dress} and must be estimated from RGBA images. As can be seen, garment topology and #vertices are fixed in the whole sequence. Participants must use the provided functions in the starting kit to write "bin" and "pc16" files to ensure bug free submissions. Note that mesh data must be triangulated.

Important: Participants can use the whole sequence in the val/test set to predict garments, but they must save frames every 10 frames in their submissions. For instance if <sequence 1> has 300 frames, the following frames must be written in the "pc16" file: [10, 20, 30, ..., 300].

Click here to enter the competition



Evaluation is done based on surface-to-surface metric defined here. Note that the evaluation is outfit-wise, i.e. garments are merged into an outfit before evaluation. To avoid heavy evaluation processing (that can block compute workers for hours), we limit number of vertices for each outfit to a maximum of 20K. In the evaluation code, we penalize outfits with more than 20K vertices. Specifically, we ignore them and assign a high error to them.


There are no news registered in