Workshop program

Sept. 9, 2018, 9 a.m. 9:30-17:00

The workshop will take place at Holiday Inn Munich – City Centre, Hochstrasse 3, 81669 München, Germany.


Full-day workshop

 9:30 –   9:45 Opening of the Workshop and Competitions overview, Sergio Escalera

 9:45 – 10:30 First invited talk, Nadine Peyrieras, Institut des Neurosciences Paris-Saclay.

Title: From 3D+time imaging data to digital and virtual embryos: acquisition, processing, analysis, modelling

Session chair: Sergio Escalera (9.30-10:30)

10:30 – 11:00 Coffee break - Award ceremony

11:00 – 11:30 Second invited talk, Philipp Krähenbühl, University of Texas at Austin.

Title: Video Compression through Image Interpolation

Abstract: An ever increasing amount of our digital communication, media consumption, and content creation revolves around videos. We share, watch, and archive many aspects of our lives through them, all of which are powered by strong video compression. Traditional video compression is laboriously hand designed and hand optimized. This talk I'll present an alternative in an end-to-end deep learning codec. Our codec builds on one simple idea: Video compression is repeated image interpolation. It thus benefits from recent advances in deep image interpolation and generation. Our deep video codec outperforms today's prevailing codecs, such as H.261, MPEG-4 Part 2, and performs on par with H.264.

11:30 - 12:30 Worshop presentations, Fingerprint competition:

FPD-M-net: Fingerprint Image Denoising and Inpainting Using M-Net Based Convolutional Neural Networks, Sukesh Adiga V and Jayanthi Sivaswamy

Iterative application of autoencoder for video inpainting and fingerprint denoising, Le Manh Quan, Yong-Guk Kim

U-Finger: Multi-Scale Dilated Convolutional Network for Fingerprint Image Denoising and Inpainting, Ramakrishna Prabhu, Xiaojing Yu, Zhangyang Wang, Ding Liu, Anxiao (Andrew) Jiang

Deep End-to-end Fingerprint Denoising and Inpainting, Youness Mansar

Session chair: Meysam Madadi (11.00-12:30)

12:30 – 14:00 Lunch Sponsored by chalearn

14:00 – 15:00 Third invited talk, Eli Shechtman, Adobe.

Title: Image Manipulation on the Natural Image Manifold

15:00 – 16:00 Workshop presentations, Decaptioning competition:

DVDNet: Deep Blind Video Decaptioning with 3D-2D Gated Convolutions, Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

Joint Caption Detection and Inpainting using Generative Network, Anubha Pandey, Vismay Patel

Video DeCaptioning using U-Net with Stacked Dilated Convolutional Layers, Shivansh Mundra, Sayan Sinha, Mehul Kumar Nirala, and Arnav Jain

Session chair: Marc Oliu (14:00-15:30)

16:00 – 16:30 Coffee break

16:30 – 17:00 Workshop presentations, Image inpainting for human pose competition:

Improving Pose Estimation with Generative Adversarial Networks, Gizem Esra Ünlü

Generative Image Inpainting for Person Pose Generation, Anubha Pandey, Vismay Patel

17:00 – 17:15 Workshop presentations:

Road layout understanding by generative adversarial inpainting, Lorenzo Berlincioni, Federico Becattini, Leonardo Galteri, Lorenzo Seidenari, Alberto Del Bimbo

17:15 – 17:45 Fifth invited talk, Ming-Yu Liu, NVIDIA

Abstract: We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.


Session chair: Ciprian Corneanu (16:00-17:30)

17:30 – 17:35 Closing


There are no news registered in