Here we do a brief overview of image-to-image translation and generative adversarial learning.
Introduction
In this series of articles, we’ll present a Mobile Image-to-Image Translation system based on a Cycle-Consistent Adversarial Networks (CycleGAN). We’ll build a CycleGAN that can perform unpaired image-to-image translation, as well as show you some entertaining yet academically deep examples.
In this project, we’ll use:
We assume that you are familiar with the concepts of Deep Learning as well as with Jupyter Notebooks and TensorFlow. You are welcome to download the project code.
Image-to-Image Translation
Style transfer is built using image-to-image translation. This technique transfers images from source domain A to target domain B. What does that mean, exactly? To put it succinctly, image-to-image translation lets us take properties from one image and apply them to another image.
Image-to-image translation has has a few interesting (and fun!) applications, such as style transfer that can start with an photo taken in summer and make it look like it was taken in winter, or vice-versa:
Or making horses look like zebras:
Image-to-image translation also powers deep fakes, which let you digitally transplant the face of one person onto another. So if you've ever wondered what it would look like if Luke Skywalker were played by Nicholas Cage, image-to-image translation can help you find it.
Image-to-image translation was first proposed in 2016 in an article titled Image-to-Image Translation with Conditional Adversarial Networks. This process involves pixel-to-pixel translation, hence it was dubbed pix2pix CGAN.
In a nutshell, a GAN can learn to map noise vector z to output image y:
G: z→y
In contrast, a pix2pix CGAN can map an input image x and a noise vector z to another image, y:
G: x,z→y
The figure below illustrates this concept.
The image above makes it obvious why the model in question is generative, supervised, and controlled (or guided). The model is given an example input image that doesn't necessarily present a complete or understandable image, and an output image or ground truth.
The learning process involves training the CGAN to translate images from domain A (input) to B (output) and vice versa in a controlled way. Hence, if you decide to change the output example image during training, your model will generate images different from the one here. This means that the generated image is guided by the ground truth.
Generative Adversarial Learning
As shown in the diagram below, a generative model (a) is a model that can generate new data to be given as input to a discriminative model (b), whose role is to distinguish (or discriminate) between data samples.
A generative adversarial network (GAN) is a neural network that combines the generative and discriminative models to perform one single task: generate new data samples. In recent years, this type of processing has seen a lot of progress, which has improved the performance of generative learning networks and helped them generate better images.
Networks such as Conditional Generative Adversarial Networks (CGAN), Deep Convolutional Generative Adversarial Networks (DCGAN), f-GAN and Wasserstein Generative Adversarial Networks (WGAN) are updated versions of the original GAN, which came out with certain limitations, such as vanishing gradient, poor diversity, and training difficulties.
All of the above models are good. However, the CGAN is more interesting in the context of image-to-image Translation. This is because CGAN offers a way to guide the generated images by introducing into the modeling a conditional variable, y.
CycleGAN in Our Series
Image-to-image translation with a pix2pix CGAN maps images from one domain to another using pixel to pixel mapping, which means that paired images should be available for every image sample. However, pairs of training images are not always available, which makes the task difficult. This motivated researchers to propose a new GAN-based network that offers unpaired image-to-image translation.
This network was presented in 2017, and it was called Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (CycleGAN). This network presents a novel approach for mapping images from domain X to domain Y, with no need for paired image examples. Hence, using this network you can transform cats to dogs, male faces to female faces, and so on.
This approach inspired people to develop very interesting and entertaining applications, such as season translation, style transfer, gender to gender translation, face swapping by FaceApp, and — the latest one — animating photos of deceased people by DeepNostalgia. We’ll use the unpaired image approach in our series.
Next Steps
In the next article, we’ll discuss the CycleGAN architecture and explain how each architectural component can be implemented. Stay tuned!