Skip to content
Snippets Groups Projects
Unverified Commit 688fc3df authored by Jen Looper's avatar Jen Looper Committed by GitHub
Browse files

Merge pull request #70 from jlooper/main

ch9
parents 5d977971 52758b1b
No related branches found
No related tags found
No related merge requests found
......@@ -6,7 +6,9 @@ When training CNNs, one of the problems is that we need a lot of labeled data. I
However, we might want to use raw (unlabeled) data for training CNN feature extractors, which is called **self-supervised learning**. Instead of labels, we will use training images as both network input and output. The main idea of **autoencoder** is that we will have an **encoder network** that converts input image into some **latent space** (normally it is just a vector of some smaller size), then the **decoder network**, whose goal would be to reconstruct the original image.
Since we are training autoencoder to capture as much of the information from the original image as possible for accurate reconstruction, the network tries to find the best **embedding** of input images to capture the meaning.л.
✅ An [autoencoder](https://wikipedia.org/wiki/Autoencoder) is "a type of artificial neural network used to learn efficient codings of unlabeled data."
Since we are training an autoencoder to capture as much of the information from the original image as possible for accurate reconstruction, the network tries to find the best **embedding** of input images to capture the meaning.л.
![AutoEncoder Diagram](images/autoencoder_schema.jpg)
......@@ -18,8 +20,8 @@ While reconstructing original images does not seem useful in its own right, ther
* **Lowering the dimension of images for visualization** or **training image embeddings**. Usually autoencoders give better results than PCA, because it takes into account spatial nature of images and hierarchical features.
* **Denoising**, i.e. removing noise from the image. Because noise carries out a lot of useless information, autoencoder cannot fit it all into relatively small latent space, and thus it captures only important part of the image. When training denoisers, we start with original images, and use images with artificially added noise as input for autoencoder.
* **Super-resolution**, increasing image resolution. We start with high-resolution images, and use the image with lower resolution as autoencoder input.
* **Generative models**. Once we train autoencoder, the decoder part can be used to create new objects starting from random latent vectors.
* **Super-resolution**, increasing image resolution. We start with high-resolution images, and use the image with lower resolution as the autoencoder input.
* **Generative models**. Once we train the autoencoder, the decoder part can be used to create new objects starting from random latent vectors.
## Variational Autoencoders (VAE)
......@@ -31,49 +33,71 @@ VAE is the autoencoder that learns to predict *statistical distribution* of the
To summarize:
* From input vector, we predict `z_mean` and `z_log_sigma` (instead of predicting the standard deviation itself, we predict it's logarithm)
* From input vector, we predict `z_mean` and `z_log_sigma` (instead of predicting the standard deviation itself, we predict its logarithm)
* We sample a vector `sample` from the distribution N(z<sub>mean</sub>,exp(z<sub>log\_sigma</sub>))
* Decoder tries to decode the original image using `sample` as an input vector
* The decoder tries to decode the original image using `sample` as an input vector
<img src="images/vae.png" width="50%">
> Image from [this blog post](https://ijdykeman.github.io/ml/2016/12/21/cvae.html) by Isaak Dykeman
Variational auto-encoders use complex loss function that consists of two parts:
Variational auto-encoders use a complex loss function that consists of two parts:
* **Reconstruction loss** is the loss function that shows how close reconstructed image is to the target (can be MSE). It is the same loss function as in normal autoencoders.
* **Reconstruction loss** is the loss function that shows how close a reconstructed image is to the target (it can be Mean Squared Error, or MSE). It is the same loss function as in normal autoencoders.
* **KL loss**, which ensures that latent variable distributions stays close to normal distribution. It is based on the notion of [Kullback-Leibler divergence](https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained) - a metric to estimate how similar two statistical distributions are.
One important advantage of VAEs is that they allow us to generate new images relatively easy, because we know which distribution to sample latent vectors from. For example, if we train VAE with 2D latent vector on MNIST, we can then vary components of the latent vector to get different digits:
One important advantage of VAEs is that they allow us to generate new images relatively easily, because we know which distribution from which to sample latent vectors. For example, if we train VAE with 2D latent vector on MNIST, we can then vary components of the latent vector to get different digits:
<img src="images/vaemnist.png" width="50%"/>
<img alt="vaemnist" src="images/vaemnist.png" width="50%"/>
> Image generated by author
> Image generated by Dmitry Soshnikov
Observe how images blend into each other, as we start getting latent vectors from the different portions of the latent parameter space. We can also visualize this space in 2D:
<img src="images/vaemnist-diag.png" width="50%"/>
<img alt="vaemnist cluster" src="images/vaemnist-diag.png" width="50%"/>
> Image generated by Dmitry Soshnikov
> Image generated by author
## Continue in a Notebook
## Continue to Notebooks
Learn more about autoencoders in these corresponding notebooks:
* [Autoencoders in TensorFlow](AutoencodersTF.ipynb)
* [Autoencoders in PyTorch](AutoEncodersPyTorch.ipynb)
## Properties of Autoencoders
* **Data Specific** - they only work well with the type of images they have been trained on. For example, if we train super-resolution network on flowers, it will not work well on portraits. This is because the network can produce higher resolution image by taking fine details from features learnt from the training dataset.
* **Lossy** - reconstructed image is not the same as the original image. The nature of loss is defined by the *loss function* used during training
* **Data Specific** - they only work well with the type of images they have been trained on. For example, if we train a super-resolution network on flowers, it will not work well on portraits. This is because the network can produce higher resolution image by taking fine details from features learned from the training dataset.
* **Lossy** - the reconstructed image is not the same as the original image. The nature of loss is defined by the *loss function* used during training
* Works on **unlabeled data**
## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/209)
> ✅ Todo: Conclusion, Assignment
* [Building Autoencoders in Keras](https://blog.keras.io/building-autoencoders-in-keras.html)
* [Blog post on NeuroHive](https://neurohive.io/ru/osnovy-data-science/variacionnyj-avtojenkoder-vae/)
* [Variational Autoencoders Explained](https://kvfrans.com/variational-autoencoders-explained/)
* [Conditional Variational Autoencoders](https://ijdykeman.github.io/ml/2016/12/21/cvae.html)
## Conclusion
## Reference
In this lesson, you learned about the various types of autoencoders available to the AI scientist. You learned how to build them, and how to use them to reconstruct images. You also learned about the VAE and how to use it to generate new images.
## 🚀 Challenge
In this lesson, you learned about using autoencoders for images. But they can also be used for music! Check out the Magenta project's [MusicVAE](https://magenta.tensorflow.org/music-vae) project, which uses autoencoders to learn to reconstruct music. Do some [experiments](https://colab.research.google.com/github/magenta/magenta-demos/blob/master/colab-notebooks/Multitrack_MusicVAE.ipynb) with this library to see what you can create.
## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/208)
## Review & Self Study
For reference, read more about autoencoders in these resources:
* [Building Autoencoders in Keras](https://blog.keras.io/building-autoencoders-in-keras.html)
* [Blog post on NeuroHive](https://neurohive.io/ru/osnovy-data-science/variacionnyj-avtojenkoder-vae/)
* [Variational Autoencoders Explained](https://kvfrans.com/variational-autoencoders-explained/)
* [Conditional Variational Autoencoders](https://ijdykeman.github.io/ml/2016/12/21/cvae.html)
\ No newline at end of file
* [Conditional Variational Autoencoders](https://ijdykeman.github.io/ml/2016/12/21/cvae.html)
## Assignment
At the end of [this notebook using TensorFlow](AutoencodersTF.ipynb), you will find a 'task' - use this as your assignment.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment