ChaLearn

Workshop description

Summary

Sign Language Recognition (SLR) systems have considerably improved their performance in recent years. Collected data moved from lab environments with a limited number of signers acting in front of simple backgrounds to more realistic data obtained from unconstrained settings. However, still there are several open challenges to be solved in order to allow SLR to be useful in practice, including signer independent evaluation [1], continuous sign recognition, exploitation of hand contextual cues (face and body), sign production, as well as model generalization to different sign languages and demographics. Providing model based solutions to the classification, detection and recognition problems is necessary, yet not sufficient. Experimental results in the field are biased, rarely reproducible and lack interpretation. In addition to the technical problems, further discussion regarding the practical needs of the solutions in the daily lives of final users is necessary. Therefore, computational and design aspects of model based solutions need reconsideration. There is also an increased demand for interdisciplinary collaboration, including the deaf community, for the creation of public datasets [2]. We want to evaluate datasets, in regard to their inclusiveness and fairness to large deaf communities from diverse demographics. In line with these, we would like to bring together researchers from the related disciplines to discuss the challenges in the domain in unconstrained settings. Furthermore, together with an updated multi-disciplinary up-to-date perspective on SLR research workshop, we will be organizing an associated ChaLearn Looking at People competition, including a signer-independent non-controlled SLR challenge involving a large number of sign categories (>200).

[1] O. M. Sincan, H. Y. Keles, "AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods”, IEEE Access, vol. 8, pp. 181340-181355, 2020.
[2] D. Bragg et al., C. Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility (pp. 16-31), 2019.

Topics and Motivation

Sign language recognition is an active area in computer vision that used to be categorized into isolated (classification) and continuous (detection) settings. The problems in both branches will be covered in the workshop. In recent years, the size of associated datasets increased both with respect to the number of signs and signers. However, the research in the field is in its infancy in regards to the robustness of the models to a large diversity of signs and signers, and to fairness of the models to performers from different demographics. Considering these important issues, the scale of existing datasets need to be discussed thoroughly by the research community, both in regard to signer variation and inclusiveness of the large deaf community under realistic conditions. In the context of the workshop, topics include the evaluation of existing datasets in these terms and the definition of appropriate guidelines. We plan to run a challenge using a recently released large scale, multi-modal dataset that provides many challenges addressing potential problems in realistic settings. We also want to provide an extended discussion on the effectiveness of different modalities, sign language production (SLP) [3], and the emerging possibility of egocentric SLR [4]. In line with these, we would like to bring together researchers in the field and from related disciplines (e.g. linguistics, human computer interaction, etc.) to discuss the advances and new challenges to address in SLR in unconstrained settings. We want to put a spotlight to the strengths and limitations of the existing approaches, particularly focusing on the continuous SLR domain, and define the future directions of the field. In this context, we accept papers addressing the issues related to, but not limited to, these topics:

Continuous SLR models
Isolated SLR models in unconstrained settings
Multi-modal SLR models
SLR datasets: design considerations, new proposals and analysis of existing datasets.
SLR production
Interpretability/explainability of SLR models
Few shot and unsupervised SLR
Fairness, accountability, and transparency in SLR
Lip reading
Egocentric SLR

[3] S. Stoll, N. C. Camgoz, S. Hadfield, R. Bowden, “Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks“, International Journal of Computer Vision, 2020.
[4] Y. Zhang, C. Cao, J. Cheng, and H. Lu, “EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition”, IEEE Trans. On Multimedia, vol 20(5), pp. 1038-1050, 2018.