Exploiting appearance transfer and multi-scale context for efficient person image generation

April 1st, 2022

Categories: Applications, Software, Visualization, Visual Analytics, Visual Informatics, Deep Learning, Human Computer Interaction (HCI), Machine Learning, Data Science, Artificial Intelligence

Authors

Shen, C., Wang, P., Tang, W.

About

Pose guided person image generation means to generate a photo-realistic person image conditioned on an input person image and a desired pose. This task requires spatial manipulation of the source image according to the target pose. However, convolutional neural networks (CNNs) are inherently limited to geometric transformations due to the fixed geometric structures in their building modules, i.e., convolution, pooling and unpooling, which cannot handle large motion and occlusions caused by large pose transform. This paper introduces a novel two-stream context-aware appearance transfer network to address these challenges. It is a three-stage architecture consisting of a source stream and a target stream. Each stage features an appearance transfer module, a multi-scale context module and two-stream feature fusion modules. The appearance transfer module handles large motion by finding the dense correspondence between the two-stream feature maps and then transferring the appearance information from the source stream to the target stream. The multi-scale context module handles occlusion via contextual modeling, which is achieved by atrous convolutions of different sampling rates. Both quantitative and qualitative results indicate the proposed network can effectively handle challenging cases of large pose transform while retaining the appearance details. Compared with state-of-the-art approaches, it achieves comparable or superior performance using much fewer parameters while being significantly faster.

Keywords: Person image generation, Appearance transfer, Multi-scale context, Efficient image generation

Relevant Funding: The COMPaaS DLV project (NSF award CNS-1828265)

https://doi.org/10.1016/j.patcog.2021.108451

Resources

URL

Citation

Shen, C., Wang, P., Tang, W., Exploiting appearance transfer and multi-scale context for efficient person image generation, Pattern Recognition, vol 124, Elsevier, April 1st, 2022. https://www.sciencedirect.com/science/article/abs/pii/S0031320321006270