From c1fd27079318823ca57bf1ee07c2bb2fa275a8ca Mon Sep 17 00:00:00 2001
From: Lateefah Bello <2019cinnamon@gmail.com>
Date: Sun, 1 May 2022 09:00:14 +0100
Subject: [PATCH] Lesson 8 and fixed the broken links

---
 etc/quiz-src/questions-en.txt                 | 63 +++++++++++++++++--
 lessons/2-Symbolic/README.md                  |  2 +-
 .../3-NeuralNetworks/03-Perceptron/README.md  |  4 +-
 .../04-OwnFramework/README.md                 | 10 +--
 .../05-Frameworks/IntroPyTorch.ipynb          |  2 +-
 .../05-Frameworks/Overfitting.md              | 11 ++--
 .../3-NeuralNetworks/05-Frameworks/README.md  |  4 +-
 .../05-Frameworks/lab/README.md               |  3 +-
 .../07-ConvNets/CNN_Architectures.md          |  2 +
 .../{ConfNetsTF.ipynb => ConvNetsTF.ipynb}    |  0
 .../4-ComputerVision/07-ConvNets/README.md    |  8 ++-
 .../08-TransferLearning/README.md             | 10 ++-
 .../08-TransferLearning/TrainingTricks.md     | 12 ++--
 .../TransferLearningTF.ipynb                  |  7 ---
 .../08-TransferLearning/lab/OxfordPets.ipynb  |  6 +-
 .../08-TransferLearning/lab/README.md         |  2 -
 16 files changed, 101 insertions(+), 45 deletions(-)
 rename lessons/4-ComputerVision/07-ConvNets/{ConfNetsTF.ipynb => ConvNetsTF.ipynb} (100%)

diff --git a/etc/quiz-src/questions-en.txt b/etc/quiz-src/questions-en.txt
index 7f5483e..c8f51da 100644
--- a/etc/quiz-src/questions-en.txt
+++ b/etc/quiz-src/questions-en.txt
@@ -1,4 +1,4 @@
-Lesson 1B Introduction to AI - Pre Quiz
+Lesson 1B Introduction to AI: Pre Quiz
 * A famous 19th century proto-computer engineer was
 - Charles Barkley
 + Charles Babbage
@@ -11,7 +11,7 @@ Lesson 1B Introduction to AI - Pre Quiz
 - true, they are usually considered to be 'intelligent
 + false, but they are increasingly able to pass Turing tests as they become more sophisticated.
 
-Lesson 1E Introduction to AI - Post-Quiz
+Lesson 1E Introduction to AI: Post-Quiz
 * A top-down approach to AI is a model of reasoning called
 - strategic reasoning
 + symbolic reasoning
@@ -76,7 +76,7 @@ Lesson 3E Introduction to Neural Networks - Perceptron: Post-Quiz
 + weights
 - gradient
 
-Lesson 4B Neural Networks - Pre Quiz
+Lesson 4B Neural Networks: Pre Quiz
 * The quality of prediction is measured by Loss function 
 + True
 - False
@@ -89,7 +89,7 @@ Lesson 4B Neural Networks - Pre Quiz
 - multiple propagation
 - front propagation
 
-Lesson 4E Neural Networks - Post Quiz
+Lesson 4E Neural Networks: Post Quiz
 * We use ____ for regression loss functions
 - absolute error
 - mean squared error
@@ -105,7 +105,7 @@ Lesson 4E Neural Networks - Post Quiz
 + True
 - False
 
-Lesson 5B Frameworks - Pre Quiz
+Lesson 5B Frameworks: Pre Quiz
 * Deep Neural Network training requires a lot of computations
 + True
 - False
@@ -118,7 +118,7 @@ Lesson 5B Frameworks - Pre Quiz
 + algorithm
 - computer
 
-Lesson 5E Frameworks - Post Quiz
+Lesson 5E Frameworks: Post Quiz
 * After compiling our model object, we train by calling ____ function
 + fit
 - train
@@ -133,3 +133,54 @@ Lesson 5E Frameworks - Post Quiz
 * Pred is the values predicted by the network
 + True
 - False
+
+Lesson 7B Convolutional Neural Networks: Pre Quiz
+* To extract patterns from images we use?
++ convolutional filters
+- extractor
+- filters
+* One of these is not a CNN Architecture
+- ResNet
+- MobileNet
++ Tensorflow
+* CNN are mostly used for computer vision tasks.
++ true
+- false
+
+Lesson 7E Convolutional Neural Networks: Post Quiz
+* Which pooling layer is used "scale down" the size of the image
+- average pooling
+- max pooling
++ a and b
+* Convolutional networks generalizes much better
++ True
+- False
+* To train our neural network, we need to convert images to tensors
++ true
+- false
+
+Lesson 8B Pre-trained Networks and Transfer Learning: Pre Quiz
+* Transfer learning approach uses untrained models for classification
+- true
++ false
+* One of these is not a normalization technique?
++ height normalization
+- weight normalization
+- layer normalization
+* We choose Stochastic Gradient Descent(SGD) in deep learning because classical gradient descent can be ____
+- fast
++ slow
+
+Lesson 8E Pre-trained Networks and Transfer Learning: Post Quiz
+* Dropout layers act as a ____ technique
+- gradient boosting
+- training
++ regularization
+* freezing weights of convolutional feature extractor can be done by ____
+- setting `requires_grad` property to `False`
+- setting `trainable` property to `False`
++ a and b
+* Batch normalization is to bring values that flow through the ____ to right interval
+- algorithms
+- batches
++ neural network
\ No newline at end of file
diff --git a/lessons/2-Symbolic/README.md b/lessons/2-Symbolic/README.md
index 421c0b6..98165d6 100644
--- a/lessons/2-Symbolic/README.md
+++ b/lessons/2-Symbolic/README.md
@@ -6,7 +6,7 @@
 
 The quest for artificial intelligence is based on a search for knowledge, to make sense of the world similar to how humans do. But how can you go about doing this?
 
-## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/201)
+## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/102)
 
 In the early days of AI, the top-down approach to creating intelligent systems (discussed in the previous lesson) was popular. The idea was to extract the knowledge from people into some machine-readable form, and then use it to automatically solve problems. This approach was based on two big ideas:
 
diff --git a/lessons/3-NeuralNetworks/03-Perceptron/README.md b/lessons/3-NeuralNetworks/03-Perceptron/README.md
index d2d7790..deaa95b 100644
--- a/lessons/3-NeuralNetworks/03-Perceptron/README.md
+++ b/lessons/3-NeuralNetworks/03-Perceptron/README.md
@@ -1,6 +1,6 @@
 # Introduction to Neural Networks: Perceptron
 
-## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/301)
+## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/103)
 
 One of the first attempts to implement something similar to a modern neural network was done by Frank Rosenblatt from Cornell Aeronautical Laboratory in 1957. It was a hardware implementation called "Mark-1", designed to recognize primitive geometric figures, such as triangles, squares and circles.
 
@@ -76,7 +76,7 @@ In this lesson, you learned about a perceptron, which is a binary classification
 
 If you'd like to try to build your own perceptron, try [this lab on Microsoft Learn](https://docs.microsoft.com/en-us/azure/machine-learning/component-reference/two-class-averaged-perceptron?WT.mc_id=academic-57639-dmitryso) which uses the [Azure ML designer](https://docs.microsoft.com/en-us/azure/machine-learning/concept-designer?WT.mc_id=academic-57639-dmitryso).
 
-## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/302)
+## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/203)
 
 ## Review & Self Study
 
diff --git a/lessons/3-NeuralNetworks/04-OwnFramework/README.md b/lessons/3-NeuralNetworks/04-OwnFramework/README.md
index 434195e..01ba217 100644
--- a/lessons/3-NeuralNetworks/04-OwnFramework/README.md
+++ b/lessons/3-NeuralNetworks/04-OwnFramework/README.md
@@ -10,7 +10,7 @@ In this section we will extend this model into a more flexible framework, allowi
 
 We will also develop our own modular framework in Python that will allow us to construct different neural network architectures.
 
-## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/401)
+## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/104)
 
 ## Formalization of Machine Learning
 
@@ -64,15 +64,15 @@ Note that the left-most part of all those expressions is the same, and thus we c
 
 ## Conclusion
 
-In this lesson, we have built our own neural network library, and we have used it for a simple two-dimensional classification task. 
+In this lesson, we have built our own neural network library, and we have used it for a simple two-dimensional classification task.
 
 ## 🚀 Challenge
 
-In the accompanying notebook, you will implement your own framework for building and training multi-layered perceptrons. You will be able to see in detail how modern neural networks operate. 
+In the accompanying notebook, you will implement your own framework for building and training multi-layered perceptrons. You will be able to see in detail how modern neural networks operate.
 
 Proceed to the [OwnFramework](OwnFramework.ipynb) notebook and work through it.
 
-## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/402)
+## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/204)
 
 ## Review & Self Study
 
@@ -83,4 +83,4 @@ Backpropagation is a common algorithm used in AI and ML, worth studying [in more
 In this lab, you are asked to use the framework you constructed in this lesson to solve MNIST handwritten digit classification.
 
 * [Instructions](lab/README.md)
-* [Notebook](lab/MyFW_MNIST.ipynb)
\ No newline at end of file
+* [Notebook](lab/MyFW_MNIST.ipynb)
diff --git a/lessons/3-NeuralNetworks/05-Frameworks/IntroPyTorch.ipynb b/lessons/3-NeuralNetworks/05-Frameworks/IntroPyTorch.ipynb
index 299757d..06cf206 100644
--- a/lessons/3-NeuralNetworks/05-Frameworks/IntroPyTorch.ipynb
+++ b/lessons/3-NeuralNetworks/05-Frameworks/IntroPyTorch.ipynb
@@ -1026,7 +1026,7 @@
       "source": [
         "## Training One-Layer Perceptron\n",
         "\n",
-        "Let's use Tensorflow gradient computing machinery to train one-layer perceptron.\n",
+        "Let's use PyTorch gradient computing machinery to train one-layer perceptron.\n",
         "\n",
         "Our neural network will have 2 inputs and 1 output. The weight matrix $W$ will have size $2\\times1$, and bias vector $b$ -- $1$.\n",
         "\n",
diff --git a/lessons/3-NeuralNetworks/05-Frameworks/Overfitting.md b/lessons/3-NeuralNetworks/05-Frameworks/Overfitting.md
index f49989a..7c0c877 100644
--- a/lessons/3-NeuralNetworks/05-Frameworks/Overfitting.md
+++ b/lessons/3-NeuralNetworks/05-Frameworks/Overfitting.md
@@ -25,7 +25,7 @@ It is very important to strike a correct balance between the richness of the mod
 
 As you can see from the graph above, overfitting can be detected by a very low training error, and a high validation error. Normally during training we will see both training and validation errors starting to decrease, and then at some point validation error might stop decreasing and start rising. This will be a sign of overfitting, and the indicator that we should probably stop training at this point (or at least make a snapshot of the model).
 
-![overfitting]("../images/Overfitting.png")
+![overfitting](../images/Overfitting.png)
 
 ## How to prevent overfitting
 
@@ -38,10 +38,11 @@ If you can see that overfitting occurs, you can do one of the following:
 ## Overfitting and Bias-Variance Tradeoff
 
 Overfitting is actually a case of a more generic problem in statistics called [Bias-Variance Tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff). If we consider the possible sources of error in our model, we can see two types of errors:
+
 * **Bias errors** are caused by our algorithm not being able to capture the relationship between training data correctly. It can result from the fact that our model is not powerful enough (**underfitting**).
 * **Variance errors**, which are caused by the model approximating noise in the input data instead of meaningful relationship (**overfitting**).
 
-During training, bias error decreases (as our model learns to approximate the data), and variance error increases. It is important to stop training - either manually (when we detect overfitting) or automatically (by introducing regularization) - to prevent overfitting. 
+During training, bias error decreases (as our model learns to approximate the data), and variance error increases. It is important to stop training - either manually (when we detect overfitting) or automatically (by introducing regularization) - to prevent overfitting.
 
 ## Conclusion
 
@@ -51,16 +52,18 @@ In this lesson, you learned about the differences between the various APIs for t
 
 In the accompanying notebooks, you will find 'tasks' at the bottom; work through the notebooks and complete the tasks.
 
-## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/502)
+## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/205)
 
 ## Review & Self Study
 
 Do some research on the following topics:
+
 - TensorFlow
 - PyTorch
 - Overfitting
 
-Ask yourself the following questions: 
+Ask yourself the following questions:
+
 - What is the difference between TensorFlow and PyTorch?
 - What is the difference between overfitting and underfitting?
 
diff --git a/lessons/3-NeuralNetworks/05-Frameworks/README.md b/lessons/3-NeuralNetworks/05-Frameworks/README.md
index 14c6b0a..2bd34c5 100644
--- a/lessons/3-NeuralNetworks/05-Frameworks/README.md
+++ b/lessons/3-NeuralNetworks/05-Frameworks/README.md
@@ -5,7 +5,7 @@ As we have learned already, to be able to train neural networks efficiently we n
 * To operate on tensors, eg. to multiply, add, and compute some functions such as sigmoid or softmax
 * To compute gradients of all expressions, in order to perform gradient descent optimization
 
-## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/501)
+## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/105)
 
 While the `numpy` library can do the first part, we need some mechanism to compute gradients. In [our framework](../04-OwnFramework/OwnFramework.ipynb) that we have developed in the previous section we had to manually program all derivative functions inside the `backward` method, which does backpropagation. Ideally, a framework should give us the opportunity to compute gradients of *any expression* that we can define.
 
@@ -23,7 +23,7 @@ High-level API| [Keras](https://keras.io/) | [PyTorch Lightning](https://pytorch
 
 **High-level APIs** pretty much consider neural networks as a **sequence of layers**, and make constructing most of the neural networks much easier. Training the model usually requires preparing the data and then calling a `fit` function to do the job.
 
-The high-level API allows you to construct typical neural networks very quickly without worrying about lots of details. At the same time, low-level API offer much more control over the training process, and thus they are used a lot in research, when you are dealing with new neural network architectures. 
+The high-level API allows you to construct typical neural networks very quickly without worrying about lots of details. At the same time, low-level API offer much more control over the training process, and thus they are used a lot in research, when you are dealing with new neural network architectures.
 
 It is also important to understand that you can use both APIs together, eg. you can develop your own network layer architecture using low-level API, and then use it inside the larger network constructed and trained with the high-level API. Or you can define a network using the high-level API as a sequence of layers, and then use your own low-level training loop to perform optimization. Both APIs use the same basic underlying concepts, and they are designed to work well together.
 
diff --git a/lessons/3-NeuralNetworks/05-Frameworks/lab/README.md b/lessons/3-NeuralNetworks/05-Frameworks/lab/README.md
index ffd8874..23f8f40 100644
--- a/lessons/3-NeuralNetworks/05-Frameworks/lab/README.md
+++ b/lessons/3-NeuralNetworks/05-Frameworks/lab/README.md
@@ -4,7 +4,7 @@ Lab Assignment from [AI for Beginners Curriculum](https://github.com/microsoft/a
 
 ## Task
 
-Solve two classification problems using single- and multi-layered fully-connected networks using PyTorch or TensorFlow:
+Solve two classification problems using single and multi-layered fully-connected networks using PyTorch or TensorFlow:
 
 1. **[Iris classification](https://en.wikipedia.org/wiki/Iris_flower_data_set)** problem - an example of problem with tabular input data, which can be handled by classical machine learning. You goal would be to classify irises into 3 classes, based on 4 numeric parameters.
 1. **MNIST** handwritten digit classification problem which we have seen before.
@@ -14,4 +14,3 @@ Try different network architectures to achieve the best accuracy you can get.
 ## Stating Notebook
 
 Start the lab by opening [LabFrameworks.ipynb](LabFrameworks.ipynb)
-
diff --git a/lessons/4-ComputerVision/07-ConvNets/CNN_Architectures.md b/lessons/4-ComputerVision/07-ConvNets/CNN_Architectures.md
index 7108a79..85d9397 100644
--- a/lessons/4-ComputerVision/07-ConvNets/CNN_Architectures.md
+++ b/lessons/4-ComputerVision/07-ConvNets/CNN_Architectures.md
@@ -41,3 +41,5 @@ Here is [a good blog post](https://medium.com/analytics-vidhya/talented-mr-1x1-c
 MobileNet is a family of models with reduced size, suitable for mobile devices. Use them if you are short in resources, and can sacrifice a little bit of accuracy. The main idea behind them is so-called **depthwise separable convolution**, which allows representing convolution filters by a composition of spatial convolutions and 1x1 convolution over depth channels. This significantly reduces the number of parameters, making the network smaller in size, and also easier to train with less data.
 
 Here is [a good blog post on MobileNet](https://medium.com/analytics-vidhya/image-classification-with-mobilenet-cc6fbb2cd470).
+
+## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/207)
diff --git a/lessons/4-ComputerVision/07-ConvNets/ConfNetsTF.ipynb b/lessons/4-ComputerVision/07-ConvNets/ConvNetsTF.ipynb
similarity index 100%
rename from lessons/4-ComputerVision/07-ConvNets/ConfNetsTF.ipynb
rename to lessons/4-ComputerVision/07-ConvNets/ConvNetsTF.ipynb
diff --git a/lessons/4-ComputerVision/07-ConvNets/README.md b/lessons/4-ComputerVision/07-ConvNets/README.md
index 35cbddf..0375548 100644
--- a/lessons/4-ComputerVision/07-ConvNets/README.md
+++ b/lessons/4-ComputerVision/07-ConvNets/README.md
@@ -2,7 +2,9 @@
 
 We have seen before that neural networks are quite good at dealing with images, and even one-layer perceptron is able to recognize handwritten digits from MNIST dataset with reasonable accuracy. However, MNIST dataset is very special, and all digits are centered inside the image, which makes the task simpler.
 
-In real life, we want to be able to recognize objects on the picture regardless of their exact location in the image. Computer vision is different from generic classification, because when we are trying to find a certain object in the picture, we are scanning the image looking for some specific **patterns** and their combinations. For example, when looking for a cat, we first may look for horizontal lines, which can form whiskers, and then certain combination of whiskers can tell us that it is actually a picture of a cat. Relative position and presence of certain patterns is important, and not their exact position on the image. 
+## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/107)
+
+In real life, we want to be able to recognize objects on the picture regardless of their exact location in the image. Computer vision is different from generic classification, because when we are trying to find a certain object in the picture, we are scanning the image looking for some specific **patterns** and their combinations. For example, when looking for a cat, we first may look for horizontal lines, which can form whiskers, and then certain combination of whiskers can tell us that it is actually a picture of a cat. Relative position and presence of certain patterns is important, and not their exact position on the image.
 
 To extract patterns, we will use the notion of **convolutional filters**. As you know, an image is represented by a 2D-matrix, or 3D-tensor with color depth. Applying a filter means that we take relatively small **filter kernel** matrix, and for each pixel in the original image we compute the weighted average with neighboring points. We can view this like a small window sliding over the whole image, and averaging out all pixels according to the weights in the filter kernel matrix.
 
@@ -20,6 +22,7 @@ However, while we can design the filters to extract some patterns manually, we c
 ## Main ideas behind CNN
 
 The way CNNs work is based on the following important ideas:
+
 * Convolutional filters can extract patterns
 * We can design the network in such a way that filters are trained automatically
 * We can use the same approach to find patterns in high-level features, not only in the original image. Thus CNN feature extraction work on a hierarchy of features, starting from low-level pixel combinations, up to higher level combination of picture parts.
@@ -52,6 +55,7 @@ As an example, let's look at the architecture of VGG-16, a network that achieved
 ## [Lab](lab/README.md)
 
 In the lab, you are tasked with classification of different cats and dogs breeds. Images are more complex than MNIST dataset and of higher dimensions, and there are more than 10 classes.
+
 ## CNNs for Other Tasks
 
-While CNNs are most often used for Computer Vision tasks, they are generally good for extracting fix-sized patterns. For example, if we are dealing with sounds, we may also want to use CNNs to look for some specific patterns in audio signal - in which case filters would be 1-dimensional (and this CNN would be called 1D-CNN). Also, sometimes 3D-CNN is used to extract features in multi-dimensional space, such as certain events occurring on video - CNN can capture certain patterns of feature changing over time. 
+While CNNs are most often used for Computer Vision tasks, they are generally good for extracting fix-sized patterns. For example, if we are dealing with sounds, we may also want to use CNNs to look for some specific patterns in audio signal - in which case filters would be 1-dimensional (and this CNN would be called 1D-CNN). Also, sometimes 3D-CNN is used to extract features in multi-dimensional space, such as certain events occurring on video - CNN can capture certain patterns of feature changing over time.
diff --git a/lessons/4-ComputerVision/08-TransferLearning/README.md b/lessons/4-ComputerVision/08-TransferLearning/README.md
index 3f88460..7a65ab3 100644
--- a/lessons/4-ComputerVision/08-TransferLearning/README.md
+++ b/lessons/4-ComputerVision/08-TransferLearning/README.md
@@ -2,6 +2,8 @@
 
 Training CNNs can take a lot of time, and a lot of data is required for that task. However, much of the time is spent to learn the best low-level filters that a network is using to extract patterns from images. A natural question arises - can we use a neural network trained on one dataset and adapt it to classifying different images without full training process?
 
+## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/108)
+
 This approach is called **transfer learning**, because we transfer some knowledge from one neural network model to another. In transfer learning, we typically start with a pre-trained model, which has been trained on some large image dataset, such as **ImageNet**. Those models can already do a good job extracting different features from generic images, and in many cases just building a classifier on top of those extracted features can yield a good result.
 
 ## Pre-Trained Models as Feature Extractors
@@ -20,7 +22,7 @@ Here are sample features extracted from a picture of a cat by VGG-16 network:
 
 ## Cats vs. Dogs Dataset
 
-In this example, we will use a dataset of [Cats and Dogs](https://www.microsoft.com/en-us/download/details.aspx?id=54765&WT.mc_id=academic-57639-dmitryso), which is very close to a real-life image classification scenario. 
+In this example, we will use a dataset of [Cats and Dogs](https://www.microsoft.com/en-us/download/details.aspx?id=54765&WT.mc_id=academic-57639-dmitryso), which is very close to a real-life image classification scenario.
 
 ## Continue in Notebook
 
@@ -29,7 +31,11 @@ Let's see transfer learning in action in corresponding notebooks:
 * [Transfer Learning - PyTorch](TransferLearningPyTorch.ipynb)
 * [Transfer Learning - TensorFlow](TransferLearningTF.ipynb)
 
+## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/208)
+
 ## [Lab](lab/README.md)
 
 In this lab, we will use real-life [Oxford-IIIT](https://www.robots.ox.ac.uk/~vgg/data/pets/) pets dataset with 35 breeds of cats and dogs, and we will build a transfer learning classifier.
- 
\ No newline at end of file
+
+> ✅ Todo: there is an unlinked file: [TrainingTricks.md](TrainingTricks.md
+)
diff --git a/lessons/4-ComputerVision/08-TransferLearning/TrainingTricks.md b/lessons/4-ComputerVision/08-TransferLearning/TrainingTricks.md
index 5aeea22..efbebfa 100644
--- a/lessons/4-ComputerVision/08-TransferLearning/TrainingTricks.md
+++ b/lessons/4-ComputerVision/08-TransferLearning/TrainingTricks.md
@@ -38,8 +38,9 @@ Here is the [original paper](https://arxiv.org/pdf/1502.03167.pdf) on batch norm
 While this may sound like a strange idea, you can see the effect of dropout on training MNIST digit classifier in [`Dropout.ipynb`](Dropout.ipynb) notebook. It speeds up training and allows us to achieve higher accuracy in less training epochs.
 
 This effect can be explained in several ways:
+
  * It can be considered to be a random shocking factor to the model, which takes optimiation out of local minimum
- * It can be considered as *implicit model averaging*, because we can say that during dropout we are training slightly different model 
+ * It can be considered as *implicit model averaging*, because we can say that during dropout we are training slightly different model
 
 > *Some people say that when a drunk person tries to learn something, he will remember this better next morning, comparing to a sober person, because a brain with some malfunctioning neurons tries to adapt better to gasp the meaning. We never tested ourselves if this is true of not*
 
@@ -52,7 +53,7 @@ One of the very important aspect of deep learning is too be able to prevent [ove
 There are several ways to prevent overfitting:
 
  * Early stopping -- continuously monitor error on validation set and stopping training when validation error starts to increase.
- * Explicit Weight Decay / Regularization -- adding an extra penalty to the loss function for high absolute values of weights, which prevents the model of getting very unstable results 
+ * Explicit Weight Decay / Regularization -- adding an extra penalty to the loss function for high absolute values of weights, which prevents the model of getting very unstable results
  * Model Averaging -- training several models and then averaging the result. This helps to minimize the variance.
  * Dropout (Implicit Model Averaging)
 
@@ -83,7 +84,7 @@ w<sup>t+1</sup> = w<sup>t</sup> - &eta;(&nabla;&lagran;/||&nabla;&lagran;||), wh
 
 This algorithm is called **Adagrad**. Another algorithms that use the same idea: **RMSProp**, **Adam**
 
-> **Adam** is considered to be a very efficient algorithm for many applications, so if you are not sure which one to use - use Adam. 
+> **Adam** is considered to be a very efficient algorithm for many applications, so if you are not sure which one to use - use Adam.
 
 ### Gradient clipping
 
@@ -93,12 +94,11 @@ Gradient clipping is an extension the idea above. When the ||&nabla;&lagran;|| &
 
 Training success often depends on the learning rate parameter &eta;. It is logical to assume that larger values of &eta; result in faster training, which is something we typically want in the beginning of the training, and then smaller value of &eta; allow us to fine-tune the network. Thus, in most of the cases we want to decrease &eta; in the process of the training.
 
-This can be done by multiplying &eta; by some number (eg. 0.98) after each epoch of the training, or by using more complicated **learning rate schedule**. 
-
+This can be done by multiplying &eta; by some number (eg. 0.98) after each epoch of the training, or by using more complicated **learning rate schedule**.
 
 ## Different Network Architectures
 
-Selecting right network architecture for your problem can be tricky. Normally, we would take an architecture that has proven to work for our specific task (or similar one). Here is a [good overview](https://www.topbots.com/a-brief-history-of-neural-network-architectures/) or neural network architectures for computer vision. 
+Selecting right network architecture for your problem can be tricky. Normally, we would take an architecture that has proven to work for our specific task (or similar one). Here is a [good overview](https://www.topbots.com/a-brief-history-of-neural-network-architectures/) or neural network architectures for computer vision.
 
 > It is important to select an architecture that will be powerful enough for the number of training samples that we have. Selecting too powerful model can result in [overfitting](../../3-NeuralNetworks/05-Frameworks/Overfitting.md)
 
diff --git a/lessons/4-ComputerVision/08-TransferLearning/TransferLearningTF.ipynb b/lessons/4-ComputerVision/08-TransferLearning/TransferLearningTF.ipynb
index 53e6bee..b21ac82 100644
--- a/lessons/4-ComputerVision/08-TransferLearning/TransferLearningTF.ipynb
+++ b/lessons/4-ComputerVision/08-TransferLearning/TransferLearningTF.ipynb
@@ -1459,13 +1459,6 @@
     "\n",
     "You can see that more complex tasks that we are solving now require higher computational power, and cannot be easily solved on the CPU. In the next unit, we will try to use more lightweight implementation to train the same model using lower compute resources, which results in just slightly lower accuracy. "
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
diff --git a/lessons/4-ComputerVision/08-TransferLearning/lab/OxfordPets.ipynb b/lessons/4-ComputerVision/08-TransferLearning/lab/OxfordPets.ipynb
index 3a5eb81..af3ca89 100644
--- a/lessons/4-ComputerVision/08-TransferLearning/lab/OxfordPets.ipynb
+++ b/lessons/4-ComputerVision/08-TransferLearning/lab/OxfordPets.ipynb
@@ -294,7 +294,7 @@
     "\n",
     "To improve the accuracy, let's use pre-trained neural network as feature extractor. Feel free to experiment with VGG-16/VGG-19 models, ResNet50, etc.\n",
     "\n",
-    "> Since this training is slower, you may start with training the model for the small number of epochs, eg. 3. You can alsways resume training to further improve accuracy if needed.\n",
+    "> Since this training is slower, you may start with training the model for the small number of epochs, eg. 3. You can always resume training to further improve accuracy if needed.\n",
     "\n",
     "We need to normalize our data differently for transfer learning, thus we will reload the dataset again using different set of transforms:"
    ]
@@ -381,9 +381,9 @@
    "source": [
     "It seems much better now!\n",
     "\n",
-    "## Optional: Calculate Top3 Accuracy\n",
+    "## Optional: Calculate Top 3 Accuracy\n",
     "\n",
-    "We can also computer Top3 accuracy using the same code as in the previous exercise.\n"
+    "We can also computer Top 3 accuracy using the same code as in the previous exercise.\n"
    ]
   },
   {
diff --git a/lessons/4-ComputerVision/08-TransferLearning/lab/README.md b/lessons/4-ComputerVision/08-TransferLearning/lab/README.md
index 44f4659..1b7931c 100644
--- a/lessons/4-ComputerVision/08-TransferLearning/lab/README.md
+++ b/lessons/4-ComputerVision/08-TransferLearning/lab/README.md
@@ -22,8 +22,6 @@ To download the dataset, use this code snippet:
 
 Start the lab by opening [OxfordPets.ipynb](OxfordPets.ipynb)
 
-
 ## Takeaway
 
 Transfer learning and pre-trained networks allow us to solve real-world image classification problems relatively easily. However, pre-trained networks work well on images of similar kind, and if we start classifying very different images (eg. medical images), we are likely to get much worse results.
-
-- 
GitLab