Dataset description

HWxPI Handwritten texts for Personality Identification Dataset

The HWxPI data set used in this task consists of handwritten Spanish essays from undergraduates Mexican students. For each essay two files are available: a manual transcript of the text and a scan image of the original sheet where the subject hand-wrote the essay. The texts of manual transcriptions have tags to mark some handwritten phenomena namely: FO (the previous word is misspelled), D (there is a drawing here; e.g. an emoji, a signature, etc), <IN> (insertion of words into the text), <MD> (modification of a word, that is a correction of a word; e.g. when the subject forgot to write a letter and modified the word), <DL> (elimination of a word), <NS> (when two words were written together; e.g. Iam instead of I am) and, SB (syllabification). Each essay is labelled with five classes corresponding to five personality trait in the Big Five Model of Personality. The traits are Extraversion, Agreeableness, Conscientiousness, Emotional stability, and Openness to experience. The classes for each trait are 1 and 0 corresponding to the high pole and low pole of each trait, respectively. To assign each label in the dataset we use an instrument named TIPI (Ten Item Personality Inventory), this instrument includes a specific set of norms for each trait.

Files ending in .data contain 3 columns, the first corresponds to the subject’s id, the second contains the path for the transcript file (i.e., a text file), and the last column contains the path for the transcript’s image (i.e., an image file) of that instance.

Each row in the .data file correspond to the same row in the .solution file for both the training and validation set. The order of traits in the .solution columns are: EXT AGR CON STA OPE; where each code stands for extraversion, agreeableness, conscientiousness, emotional stability and openness, respectively.

Below you can see two samples of the provided data:

To have access to the data, please register to the competition in the CodaLab site:



There are no news registered in HWxPI Data set