Challenge description




Multimedia information processing is a fruitful research topic that has focused in a number of tasks, among them, those focusing on the analysis of human behavior. Although great advances have been obtained in the, so-called, Looking At People field, it is only recently that attention from this area is targeting problems that have to do with more complex behavior. For instance, personality and social behaviors are only starting to be explored from a multimedia information processing perspective. We organize a contest for ICPR with two tracks on the analysis of non-obvious human behavior. On the one hand, we focus on result diversification of retrieval results using information obtained from social networks. This is a follow up of past challenges on this topic organized in the MediaEval forum. On the other hand, we organize the first competition on recognizing personality from digitized written documents. A new data set is released comprising in addition to images, the  transcripts of documents. With this track we aim to set the basis for research on inferring personality from users' handwriting (including taking into account errors and type of writing). Both tracks lie at the frontier of research on Looking At People and multimedia information processing, and we foresee both contests will advance the state of the art in this subject, putting in the spotlight ICPR2018 as leader venue on these topics.


Two tasks related to social and personality human behavior analysis from multimodal data are studied. Both types of human behavior are difficult to characterize even for human experts, our interest in this contest is to leverage on multimodal data to solve these problems. On the one hand, we aim to infer the personality traits of students, starting from scans of handwritten documents. The multimodality in this task is given by the availability of manually generated transcripts. Thus, we aim to assess the complementariness of both information sources. Our hypothesis is that the fusion of visual and textual information will result in better performance, grounded on the fact that the text captures language usage patterns, whereas the image captures handwriting style patterns. On the other hand, we aim to improve retrieval and diversification results from images posted in social networks. In this task we aim to study effective information fusion techniques that can exploit redundancy and complementariness of multimodal data in the context of image retrieval. Hence, both tasks focus on multimodal information fusion and on complex human behavior analysis problems.


In the following we briefly describe the two task running as part of this challenge. Please refer to the track sites for further details on the two tasks:



Task 1-DivFusion: Information fusion for social image retrieval and diversification. Diversification of image search results is now a hot research problem in multimedia. Search engines such as Google Image Search are now fostering techniques that allow for providing the user with a diverse representation of his search results, rather than providing redundant information, e.g. the same perspective of a monument, or location etc. The DivFusion task builds on the MediaEval Retrieving Diverse Social Images Tasks, which were addressing specifically the diversification of image search results in the context of social media. The task challenges the participants to develop highly effective information fusion techniques. The participants will be provided with several query results, content descriptors and output of various existing diversification systems. They are to employ fusion strategies to refine the retrieval results thus to improve even more the diversification performance of the existing systems.

The challenge reuses the publicly available datasets issued from the 2013-2016 MediaEval Retrieving Diverse Social Images Tasks, together with the participant runs. The data consist of hundreds of Flickr image query results (>600 queries, both single- and multi- topic) and include: images (up to 300 images per query), social metadata (e.g., description, number of comments, tags, etc.), descriptors for visual, text, social information (e.g. user tagging credibility), as well as deep learning features, expert annotations for image relevance and diversification (i.e. clustering of images according to the similarity of their content). An example is presented in Figure 1. The data will be accompanied by 180 participant runs, which correspond to the output of various image search retrieval diversification techniques (each run contains the diversification of each query from a dataset). These would allow to experiment with different fusion strategies.


Task 2- HWxPI: Handwritten texts for Personality Identification. The tasks consist in estimating the personality traits of users from their handwritten texts and the corresponding transcripts (see the dataset description above). The challenge comprises two phases, development and final phases. For the first phase, the participants should develop their systems using a set of development pairs of handwritten essays (including image and text) from 418 subjects. Each subject has an associated class 1 and 0, corresponding to the presence of a high pole or a low pole of a specific personality trait. The traits correspond to the Big Five personality model used in psychology: Extraversion, Agreeableness, Conscientiousness, Emotional stability, and Openness to experience. Thus, participants will have to develop a classifier to predict the pole of each trait, this classifier should be able to use the information from both modalities (i.e. textual and visual). For the final evaluation phase, an independent set of 293 unlabeled samples will be provided to the participants, who will have to provide predictions using the models trained on development data. The winners of the challenge will be determined with basis on the final phase performance.

The corpus used in this task consists of handwritten Spanish essays from undergraduates Mexican students. For each essay two files are available: a manual transcript of the text and a scan image of the original sheet where the subject hand-wrote the essay. The texts of manual transcriptions have tags to mark some handwritten phenomena namely: <FO:well-written word> (misspelling), <D:description> (drawing), <IN> (insertion of a letter into a word), <MD> (modification of a word, that is a correction of a word), <DL> (elimination of a word), <NS> (when two words were written together; e.g. Iam instead of I am) and, SB (syllabification). Table 2 shows a pair of essays with their corresponding image and text. Each essay is labelled with five classes corresponding to five personality trait in the Big Five Model of Personality. The traits are Extraversion, Agreeableness, Conscientiousness, Emotional stability, and Openness to experience. The classes for each trait are 1 and 0 corresponding to the high pole and low pole of each trait, respectively. To assign each label in the dataset we use an instrument named TIPI (Ten Item Personality Inventory), this instrument includes a specific set of norms for each trait.




Challenge started!

The challange has started, please visiti the track's sites and the corresponding CodaLab pages for further information!

DivFusion track


HWxPI track