Skip to content
Snippets Groups Projects
Commit e55a0c26 authored by Jen Looper's avatar Jen Looper
Browse files

lesson 5 edits

parent 4b1c3f18
No related branches found
No related tags found
No related merge requests found
......@@ -2,18 +2,18 @@
Overfitting is an extremely important concept in machine learning, and it is very important to get it right!
Consider the following simple problem of approximating 5 dots (represented by `x` on the graphs below):
Consider the following problem of approximating 5 dots (represented by `x` on the graphs below):
![](../images/overfit1.jpg) | ![](../images/overfit2.jpg)
![linear](../images/overfit1.jpg) | ![overfit](../images/overfit2.jpg)
-------------------------|--------------------------
**Linear model, 2 parameters** | **Non-linear model, 7 parameters**
Training error = 5.3 | Training error = 0
Validation error = 5.1 | Validation error = 20
* On the left, we see a good straight line approximation. Because the number of parameters is adequate, the model gets the idea behind point distribution right.
* On the right, the model is too powerful. Because we only have 5 points and model has 7 parameters, it can adjust in such a way as to pass through all points, making training error to be 0. However, it prevents the model from understanding the correct pattern behind data, thus validation error is very high.
* On the right, the model is too powerful. Because we only have 5 points and the model has 7 parameters, it can adjust in such a way as to pass through all points, making training the error to be 0. However, this prevents the model from understanding the correct pattern behind data, thus the validation error is very high.
Thus it is very important to strike a correct balance between richness of the model (number of parameters) and the number of training samples.
It is very important to strike a correct balance between the richness of the model (number of parameters) and the number of training samples.
## Why overfitting occurs
......@@ -23,9 +23,9 @@ Thus it is very important to strike a correct balance between richness of the mo
## How to detect overfitting
As you can see from the graph above, overfitting can be detected by very low training error, and high validation error. Normally during training we will see both training and validation errors starting to decrease, and then at some point validaton error might stop decreasing and start rising. This will be a sign of overfitting, and the indicator that we should probably stop training at this point (or at least make a snapshot of the model).
As you can see from the graph above, overfitting can be detected by a very low training error, and a high validation error. Normally during training we will see both training and validation errors starting to decrease, and then at some point validation error might stop decreasing and start rising. This will be a sign of overfitting, and the indicator that we should probably stop training at this point (or at least make a snapshot of the model).
<img src="../images/Overfitting.png" width="90%"/>
![overfitting]("../images/Overfitting.png")
## How to prevent overfitting
......@@ -37,8 +37,36 @@ If you can see that overfitting occurs, you can do one of the following:
## Overfitting and Bias-Variance Tradeoff
Overfitting is actually a case of more generic problem in statistics, called [Bias-Variance Tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff). If we consider possible sources of error in our model, we can see two types of errors:
Overfitting is actually a case of a more generic problem in statistics called [Bias-Variance Tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff). If we consider the possible sources of error in our model, we can see two types of errors:
* **Bias errors** are caused by our algorithm not being able to capture the relationship between training data correctly. It can result from the fact that our model is not powerful enough (**underfitting**).
* **Variance errors**, which are caused by the model approximating noise in the input data instead of meaningful relationship (**overfitting**).
During training, bias error decreases (as our model learns to approximate the data), and variance error increases. It is important to stop training - either manually (when we detect overfitting) or automatically (by introducing regularization) to prevent overfitting.
During training, bias error decreases (as our model learns to approximate the data), and variance error increases. It is important to stop training - either manually (when we detect overfitting) or automatically (by introducing regularization) - to prevent overfitting.
## Conclusion
In this lesson, you learned about the differences between the various APIs for the two most popular AI frameworks, TensorFlow and PyTorch. In addition, you learned about a very important topic, overfitting.
## 🚀 Challenge
In the accompanying notebooks, you will find 'tasks' at the bottom; work through the notebooks and complete the tasks.
## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/10)
## Review & Self Study
Do some research on the following topics:
- TensorFlow
- PyTorch
- Overfitting
Ask yourself the following questions:
- What is the difference between TensorFlow and PyTorch?
- What is the difference between overfitting and underfitting?
## [Assignment](lab/README.md)
In this lab, you are asked to solve two classification problems using single- and multi-layered fully-connected networks using PyTorch or TensorFlow.
* [Instructions](lab/README.md)
* [Notebook](lab/LabFrameworks.ipynb)
\ No newline at end of file
# Neural Network Frameworks
As we have learnt already, to be able to train neural networks efficiently we need to do two things:
As we have learned already, to be able to train neural networks efficiently we need to do two things:
* To operate on tensors, eg. to multiply, add, and compute some functions such as sigmoid or softmax
* To compute gradients of all expressions, in order to perform gradient descent optimization
While `numpy` library can do the first part, we need some mechanism to compute gradients. In [our framework](../04-OwnFramework/OwnFramework.ipynb) that we have developed in the previous section we had to manually program all derivative functions inside the `backward` method, which does back propagation. Ideally, a framework should give us the opportunity to compute gradients of *any expression* that we can define.
## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/9)
While the `numpy` library can do the first part, we need some mechanism to compute gradients. In [our framework](../04-OwnFramework/OwnFramework.ipynb) that we have developed in the previous section we had to manually program all derivative functions inside the `backward` method, which does backpropagation. Ideally, a framework should give us the opportunity to compute gradients of *any expression* that we can define.
Another important thing is to be able to perform computations on GPU, or any other specialized compute units, such as [TPU](https://en.wikipedia.org/wiki/Tensor_Processing_Unit). Deep neural network training requires *a lot* of computations, and to be able to parallelize those computations on GPUs is very important.
Currently, there are two most popular neural frameworks: [TensorFlow](http://TensorFlow.org), and [PyTorch](https://pytorch.org/). Both provide low-level API to operate with tensors on both CPU and GPU. On top of the low-level API, there is also higher-level API, called [Keras](https://keras.io/) and [PyTorch Lightning](https://pytorchlightning.ai/) correspondingly.
> ✅ The term 'parallelize' means to distribute the computations over multiple devices.
Currently, the two most popular neural frameworks are: [TensorFlow](http://TensorFlow.org) and [PyTorch](https://pytorch.org/). Both provide a low-level API to operate with tensors on both CPU and GPU. On top of the low-level API, there is also higher-level API, called [Keras](https://keras.io/) and [PyTorch Lightning](https://pytorchlightning.ai/) correspondingly.
Low-Level API | [TensorFlow](http://TensorFlow.org) | [PyTorch](https://pytorch.org/)
--------------|-------------------------------------|--------------------------------
High-level API| [Keras](https://keras.io/) | [PyTorch Lightning](https://pytorchlightning.ai/)
**Low-level APIs** in both frameworks allow you to build so-called **computational graph**. This graph defines how to compute the output (usually the loss function) with given input parameters, and can be pushed for computation on GPU, if it is available. There are functions to differentiate this computational graph and compute gradients, which can then be used for optimizing model parameters.
**Low-level APIs** in both frameworks allow you to build so-called **computational graphs**. This graph defines how to compute the output (usually the loss function) with given input parameters, and can be pushed for computation on GPU, if it is available. There are functions to differentiate this computational graph and compute gradients, which can then be used for optimizing model parameters.
**High-level APIs** pretty much consider neural network as a **sequence of layers**, and make constructing most of the neural networks much easier. Training the model usually requires preparing the data and then calling `fit` function to do the job.
**High-level APIs** pretty much consider neural networks as a **sequence of layers**, and make constructing most of the neural networks much easier. Training the model usually requires preparing the data and then calling a `fit` function to do the job.
High-level API allows you to construct typical neural networks very fast, without worrying about lots of details. At the same time, low-level API offer much more control over training process, and thus they are used a lot in research, when you are dealing with new neural network architectures.
The high-level API allows you to construct typical neural networks very quickly without worrying about lots of details. At the same time, low-level API offer much more control over the training process, and thus they are used a lot in research, when you are dealing with new neural network architectures.
It is also important to understand that you can use both APIs together, eg. you can develop your own network layer architecture using low-level API, and then use it inside the larger network constructed and trained with high-level API. Or you can define a network using high-level API as a sequence of layers, and then use your own low-level training loop to perform optimization. Both APIs use the same basic underlying concepts, and they are designed to work well together.
It is also important to understand that you can use both APIs together, eg. you can develop your own network layer architecture using low-level API, and then use it inside the larger network constructed and trained with the high-level API. Or you can define a network using the high-level API as a sequence of layers, and then use your own low-level training loop to perform optimization. Both APIs use the same basic underlying concepts, and they are designed to work well together.
## Learning
In this course, we offer most of the content both for PyTorch and TensorFlow. You can chose your preferred framework and only go through the corresponding notebooks. If you are not sure which framework to chose - read some discussions on the internet regarding **PyTorch vs. TensorFlow**. You can also have a look at both frameworks to get better understanding.
In this course, we offer most of the content both for PyTorch and TensorFlow. You can choose your preferred framework and only go through the corresponding notebooks. If you are not sure which framework to choose, read some discussions on the internet regarding **PyTorch vs. TensorFlow**. You can also have a look at both frameworks to get better understanding.
Where possible, we will use High-Level APIs for simplicity. However, we believe it is important to understand how neural networks work from the ground up, thus in the beginning we start by working with low-level API and tensors. However, if you want to get going fast and do not want to spend a lot of time on details, you can skip those, and go straight into high-level API notebooks.
Where possible, we will use High-Level APIs for simplicity. However, we believe it is important to understand how neural networks work from the ground up, thus in the beginning we start by working with low-level API and tensors. However, if you want to get going fast and do not want to spend a lot of time on learning these details, you can skip those and go straight into high-level API notebooks.
## Continue into Notebooks
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment