The tale of GANs: “Can AI be creative?”

Prince Canuma
7 min readFeb 17, 2020

In this article, we are going to take a look at Generative Adversarial Networks(GANs), specifically Wasserstein GAN.

We are very inquisitive beings, through our curiosity and the ability processing some part of the crazy amounts data that come every single second from the various sensory input forms, visual, audio, tactile and etc. we are able to put use all that data or at least a good part of it and generate new data in form of thoughts, dreams and imagination.

With that in mind, “can we teach machines to create data from data?”

If we actually stop to think about it, we can extrapolate or perhaps I should say, use the same algorithm we humans do to teach machines how to be creative. Wait, how would that work?

I know, so many questions and so little time to answer them.

The Master Algorithm

Humans are fascinating, wait no, their brains are fascinating 🤔.

Simply because the human brain is so powerful such that it gave us dominion over every other species on earth, and some of its inner workings and powers are still unknown to us, we try to explain it’s performance in form of abstract meanings and words such intelligence and IQ. But now one can actually tell you what it really means to be intelligent the criterion varies depending on time, environment and etc.

The human brain is an organ of active study and some of the discoveries made have influenced many areas of study directly such AI for example .

Stop for a minute! ⏱

What you are doing right now is the master algorithm, by reading the paragraph above your brain might’ve generated images, some of them you probably never seen before, all so it could create a new connection between neurones that will help you remember and/or understand that paragraph, we call this “imagination”.

Funny fact: Writing this article is a mixture of a language understanding and text generation both techniques from NLP(Natural Language Processing) the later is part of the intersection of NLP and GANs.

There is a process more surreal and supernatural like that happens when we are sleeping, during sleep we normally have “dreams” where we mostly don’t have any control over sometimes we dream of things, people and places that never seen before.

Unsurprisingly I believe this process happens when our brains are fully developed, and if not, at least it’s better when we are grown up simply because we over the course of time our brains develop and we increase our ability to consume more data that will in turn fuel more realistic and hallucination like dreams and imaginations.

Our imagination or dreams can be influenced by both external inspiration and aspiration, what if we could mimic at a high-level how the way the brain creates data(imagination and dream)? Meaning the steps that the brain undergoes when creating, without going into details of the physical or chemical processes.

GANs

(Generative Adversarial Networks)

GANs are a class of Deep Learning algorithms invented by Ian Goodfellow et al in 2014. The main idea is to have two neural networks namely generator and critic compete against each other in a zero-sum game.

A generator network produces synthetic data given some noise source and a critic network discriminates between the generator’s output and true data. GANs can produce very visually appealing samples, but are often hard to train.

So, let’s use a common real-life example, imagine we have a Detective(critic network) that is specialised in telling forged art from the real art and an Artist(generator network)that is specialised in creating forgeries. Now when the game begins both critic and generator know nothing about their jobs.

Training strategy

We train the critic by feeding it either a generated sample(fake) or a true data sample and have it distinguish between the two (binary classification).

Then we train the generator by forcing it to generate images that are more likely to fool the critic.

So, basically we can say that we train the critic to maximise the distance between the real and fake images(this is also known as maximising the likelihood on our data) and then we train the generator to minimise the distance of its outputs from the real ones(the reverse of what the critic does) by using the prediction of the discriminator as a loss function(also known as minimising the Kullback-Leibler Divergence).

There are many types of GANs right now, let me list some before we take a look at one of them:

  • DCGAN
  • WGAN *
  • Cycle GAN
  • Big GAN
  • Style GAN and etc.

WGAN (Wasserstein GAN)

A sample of generated images

This type of GANs proposed two big changes. First, from a critic that outputs discrete classification(a number between 0–1) to a critic that outputs a continuous classification(a number between zero and infinity) the way we do this is by eliminating the classification head and make a fully Convolutional architecture.

Don’t worry if you don’t understand it right away all is covered in the notebook I’m preparing.

Second, using these continuous classifier outputs for both a sample batch of the true images(q) and the sample batch of noise vectors(p) to create a loss function they call Earth-Mover(also called Wasserstein-1) which measures the distance between p and q.

Loss function

Just to make it clear if you couldn’t infer from the images above, D is the critic, G is the generator x~Pr is a batch sample of real images and x_tilde~Pg is a batch sample of fake images generated from a batch of noise vectors z of size nx1(where n is a number up to you to decide it could be 5, 10, 100…).

One of the clever ways the authors of the paper came up with in order to create a good critic(classifier) was by trying something not very intuitive but never the less interesting, they trained a critic NN(neural network) by setting the weights to lie in a compact space (between [-0.01,0.01]) after training epoch(commonly called gradient update) and then use these critic weights to do backprop and weight update for the generator.

This is called weight clipping(restraining the range of the weights) interestingly enough in a later paper entitled “Improved training of WGAN” it was shown that this technique leads to optimisation difficulties, and that even when optimisation succeeds the resulting critic can have problems. Some of these problems can be lessen by adding batch norm layers in the critic network. However you can’t have very deep critic networks even with BatchNorm layers because they often fail to converge.

This second paper’s has two main contributions:

  • The use of Gradient penalty instead of weight clipping, simply because it produces a more desired behaviour than weight clipping. More on this in the notebook.
  • And finally they use Adam optimiser instead of RMSProp.

If you have read some of my previous work you might know by now that I consider weights the knowledge of the network because it’s through adjusting the weights that the network learns and improves its output.

Algorithm

The image above illustrates the algorithm for WGAN.

Interestingly, I marked the part of the code that do similar or identical operations with the same colour.

  • Blue: Drawing a batch of size n(usually 64) of real images or noise vectors from the Gaussian distribution(also known as prior samples)
  • Green: A forward pass of the batch of real images through our model, generation of a batch of images from noise vectors that then goes through the critic network for classification and finally we pass that through the loss function and calculate the gradients of the network parameters(weights and biases) with respect to the loss.
  • Yellow: Weight update operation using RMSProp optimiser.
  • Red: Critic weight clipping

This algorithm is no longer the state-of-the-art by any means but still is a great example to demonstrate the creative power of Neural Networks.

We just covered the beginning there are so many great papers and newer techniques that produce unbelievably realistic results that make it hard for us humans to known whether it’s real or not, a great example of this would be Deepfakes(generated images of people that don’t exist and/or video of a person saying or doing something they never did).

Email me at prince.gdt@gmail.com to have access to the notebook once I’m done.

Conclusion

Neural networks can create data from data, there so many great applications of for this such as project magenta and so many others but there are also cases where this same technology can be used for evil.

Never the less it’s amazing to be alive in a time where we are creating a creator.

Thank you for reading, this has been a fun discovery for me too, these kinds of algorithms fascinate me for their raw potential to consume GB or even PB of data and generate new data. Please leave your comments.

References

[1] https://www.fast.ai/

[2] https://arxiv.org/pdf/1701.07875

[3] https://arxiv.org/abs/1704.00028

Get in touch

Twitter: @CanumaGdt

LinkedIn: https://www.linkedin.com/in/prince-canuma-05814b121/

Email: prince.gdt@gmail.com

--

--

Prince Canuma

Helping research & production teams achieve MLOps success | Ex-@neptune_ai