Workshop program

July 26, 2017, 9 a.m. Full-day workshop

8:45h Opening session: Welcome, presentation of the workshop, and challenge summary. Sergio Escalera


9:00h Session I: Job candidate screening coopetition (15 min. presentations +5 mins Q&A)

Session chair: Sergio Escalera

  • Multi-modal Score Fusion and Decision Trees for Explainable Automatic Job Candidate Screening from Video CVs. Heysem Kaya; Furkan Gürpınar; Ali Salah (1st stage and 2nd stage winners) [slides]

  • Personality traits and job candidate screening via analyzing facial videos. Salah Eddine Bekhouche; Fadi Dornaika; Abdelkrim Ouafi; Taleb Ahmed Abdemalik. (1st stage second place) [slides]

  • Human-Explainable Features for Job Candidate Screening Prediction. Achmadnoer S Wicaksana; Cynthia Liem (second stage 1st place) [slides]


10:00  Coffee break


10:30 Invited speaker I: Trevor Darrell, Deep Learning for Perception, Action, and Explanation

"Learning of layered or "deep" representations has provided significant advances in computer vision in recent years, but has traditionally been limited to fully supervised settings with very large amounts of training data, where the model lacked interpretebility.  New results in adversarial adaptive representation learning show how such methods can also excel when learning  across modalities and domains, and further can be trained or constrained to provide natural language explanations or multimodal visualizations to their users.  I'll present recent long-term recurrent network models that learn cross-modal description and explanation, using implicit and explicit approaches, which can be applied to domains including fine-grained recognition and visuomotor policies."

Session chair: Sergio Escalera


11:15h  Invited speaker II: Antonio Torralba, Dissecting learned visual representations

"The performance achieved by convNets is remarkable and constitute the state of the art on most recognition tasks. But why it works so well? what is the nature of the internal representation learned by the network? I will show that the internal representation can be interpretable. I will describe a procedure to authomatically interpret the internal representation learned by a convnet. I will show how this interpretation can be used to explain the output of the network to individual images.

Session chair: Xavier Baró


12:00  Lunch


14:00 Session II: Explainable Computer Vision I (15 min. presentations +5 mins Q&A)

Session chair: Julio Jacques Junior

  • Explaining Distributed Neural Activations via Unsupervised Learning. Soheil Kolouri; Charles Martin; Heiko Hoffmann [slides]

  • Automated Screening Of Job Candidate Based On Multimodal Video Processing. Jelena Gorbova; Andre Litvin; Iiris Lusi; Gholamreza Anbarjafari [slides]

  • Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks Devinder Kumar; Alexander Wong; Graham Taylor [slides]

  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath Ramasamy Selvaraju; Michael Cogswell; Abhishek Das; Ramakrishna Vedantam; Devi Parikh; Dhruv Batra [slides]


15:20  Coffee break


16:00  Invited speaker III: Cordelia Schmid,  Structured models for human action recognition

"In this talk, we present some recent results for human action recognition in videos. We, first, introduce a pose-based convolutional neural network descriptor for action recognition, which aggregates motion and appearance information along tracks of human body parts. We also present an approach for extracting such human pose in 2D and 3D. Next, we propose an approach for spatio-temporal action localization, which detects and scores CNN action proposals at a frame as well as at a tubelet level and then tracks high-scoring proposals in the video. Action are localized in time with an LSTM at the track level. Finally, we show how to extend this type of method to weakly supervised learning of actions, which allows to scale to large amounts of data without manual annotation."

Session chair: Xavier Baró


16:45 Session III: Explainable Computer Vision 2 (15 min. presentations +5 mins Q&A)

  • It Takes Two to Tango: Towards Theory of AI's Mind. Arjun Chandrasekharan; Deshraj Yadav; Prithvijit Chattopadhyay; Viraj Prabhu; Devi Parikh [slides]

  • Decoding the Deep: Exploring class hierarchies of deep representations using multiresolution matrix factorization. Vamsi Ithapu [slides]

  • Interpreting CNN Models for Apparent Personality Trait Regression. Carles Ventura; David Masip; Agata Lapedriza [slides]

Session chair: Julio Jacques Junior


17:45  Closing


Workshop date confirmed

The 2017 Chalearn Looking at People Workshop CVPR   will be held on July 26, 2017, see you in Hawaii,