For this task we will use as baselines two approaches based on convolutional autoencoders:

B1: considers the whole frames as global context to infer missing parts, such as the complete spatial structure is used to generate complete frames. We use a deep convolutional autoencoder, with a VGG-like encoder consisting of 3 blocks of 2 convolutional layers, batch normalization and relu non-linearities, with respectively 64, 128 and 256 filters of size 3x3 pixels. The decoder is composed of 4 blocks of convolutional, upsampling, batch normalization and relu activations. This model is heavy in terms of parameters and memory usage, since it assume inputs and outputs of size 256x256x3.

B2: processes frames by patches of 32x32 pixels, allowing to reduce the number of parameters. A patch auto-encoder (similar but lighter than first baseline) is jointly learned with a patch classifier that detect the presence of text. At inference, a frame is first split into non-overlapped patches, the classifier informs about the presence of text and if so the patch is passed through the decoder. Patches classified as non-text are simply copied to the output generated frame. This approach is faster to train and efficient to remove text with no background overlays. However, since it ignore the context, it cannot performs well on patches containing only missing values.


qualitative and quantitative results on two baselines based on adaptation of [7,10,11].


[7]  D. Pathak, P. Kr¨ahenb¨uhl, J. Donahue, T. Darrell, and A. Efros, “Context encoders: Feature learning by inpainting,” in Computer Vision and Pattern Recognition (CVPR), 2016.
[10]  C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, and H. Li, “High-resolution image inpainting using multi-scale neural patch synthesis,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[11]  R. A. Yeh, C. Chen, T. Y. Lim, S. A. G., M. Hasegawa-Johnson, and M. N. Do, “Semantic image inpainting with deep generative models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.


ECCV Satellite Event and TPAMI

Challenge and associated ECCV Satellite Event and TPAMI special issue on Inpainting and Denoising in the Deep Learning Age!