Super Resolution GAN

A comparison of SR techniques

When reviewing Image Segmentation using Deeplab v3, I noted that the images being fed to the neural network were limited to 512x512. This limits the system to low resolution images. In my usage, I was needing to work with high resolution images so in order to try and keep the higher resolutions, I would need something like a super resolution GAN to restore the high resolutions.

It seems to be counter intuitive that one can gain a high resolution image from a low resolution one. Afterall, the low res image has missing details. How can one get those details back? Well, in reality you are not getting those lost details. The details are faked, in much the same way deep fakes are made today using GANs. What you get is a more pleasing and textured result that appears to have more realistic detail. You can see from the example images above that SRGAN not a faithful reconstruction of the details, but a close approximation.

There are a number of implementations of SRGAN on GitHub. I chose to use Keras-GAN's version of SRGAN to try it out. There was some difficulty using Conda to install everything in my Anaconda environment. It worked best if I used pip to install from the requirements.txt file but since tensorflow is also required I had to do that manually.

I first used conda to install tensorflow but that resulted in an error message about multiple versions of OpenMP. I removed the conda version of tensorflow and used pip to install it. That fixed that issue.

Due to version differences, I had to remove the 1.3 version of Keras installed from the pip install from the requirements.txt file and installed version 1.1.0 due to depreciation issues with the keras.misc.imread() method.

The last issue I had required me to modify the srgan.py file line 16 to read…

from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization

Again, this has to do with version differences from when the Keras-GAN project was made. The project should have included version numbering in the requirements.txt file.

Before running the application you need to download the training data from https://www.dropbox.com/sh/8oqt9vytwxb3s4r/AADIKlz8PR9zr6Y20qbkunrba/Img/img_align_celeba.zip?dl=0. The instructions are at the top of the srgan.py file.

With that done, I was able to run the application. If you cannot use a GPU it will take a very long time for this to run. I stopped after about 16 hours but I could at least see that it was making progress.

This project does not really get you ready to build and train a model to use for inferences. The model is not saved after training and here is no method for making inferences. There are other SRGAN projects at GitHub that use pretrained models and have the code for training as well as inferences. What it does do is demonstrate the training of the network. It produces a comparison image every 50 epochs so you can see how the training is going. It also demonstrates why it takes 30000 epochs to train a model. It is a Keras implementation of an SRGAN based on the paper linked below.

The state of the art for solving many computer vision problems is using CNNs. GANs make use of two CNNs, a generator and a discriminator. In training, the generator never sees the original high resolution image. It only receives the low resolution image and tries to generate an HR image that can fool the discriminator. The discriminator gets trained to distinguish between super resolution images and real images. In training, the discriminator adjusts the weights for the generator so it learns how to further improve its result.

The paper Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network by Christian Ledig et al contains all the gory details and history of Super Resolution as well as the implementation of SRGAN. SRGAN is the state of the art solution for regenerating perceptual details and textures when super resolving images.

The authors built upon already existing constructs demonstrated by many other authored papers and combined the best results in a unique way in part by formulating a loss function based on the euclidean distance between feature maps extracted from the VGG19 and using the concepts of residual blocks and skip-connections of CNNs.

Architecture of Generator and Discriminator Network with corresponding kernel size (k), number of feature maps (n) and stride (s) indicated for each convolutional layer.

The Keras-GAN implementation will require some additional programming in order to build and save a model you could reuse for your own purposes. It does however demonstrate the uncanny power of this construct to obtain realistic looking high resolution images from low resolution images.

--

--

Emerging technologies & challenges

Thoughts about emerging technologies and some of the challenges related to them. The technology itself usually is not the problem (or the solution).