2018 Looking at People ECCV Satellite Challenge - Track 1 - image inpainting
Baseline
To establish a clear baseline to guide the participant we will use some methods representative of the state art on the field. These methods are:
Context encoders. [1] This method uses an autoencoder model joined with a generative adversarial model(GAN). It was trained by a conjunction of a reconstruction loss (normalized masked L2) and an adversarial loss provided by the discriminator.
Multi-scale neural patches. [2] This baseline is centered on obtaining high resolution results, to do so it uses a model based on two networks, a content network tasked with creating an initial approximation and a texture network which is tasked with adding high frequency details by constraining the texture of the generated image according to the texture of the non masked area. Context Encoders are used as the global content prediction network which serves as initialization for the multi-scale algorithm.
Semantic inpainting. [3] This model learns to generate new samples from the dataset, which means it learns the manifold where the dataset exists. Then they infer a reconstruction by finding the point on this manifold closest to the encoded masked image.
Qualitative and quantitative samples of chosen baselines for track 1 (resized). DSSIM to the original has been added below the reconstructions.
[1] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. Efros, “Context encoders: Feature learning by inpainting,” in Computer Vision and Pattern Recognition (CVPR), 2016.
[2] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, and H. Li, “High-resolution image inpainting using multi-scale neural patch synthesis,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[3] R. A. Yeh, C. Chen, T. Y. Lim, S. A. G., M. Hasegawa-Johnson, and M. N. Do, “Semantic image inpainting with deep generative models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017, equal contribution.