diff --git a/5-NLP/14-Embeddings/EmbeddingsPyTorch.ipynb b/5-NLP/14-Embeddings/EmbeddingsPyTorch.ipynb
index c77795cff2e06467e8e78fe3801f20c5c267223c..c70f64af61a629cdcd0eaafca6a90c2e9b193d73 100644
--- a/5-NLP/14-Embeddings/EmbeddingsPyTorch.ipynb
+++ b/5-NLP/14-Embeddings/EmbeddingsPyTorch.ipynb
@@ -686,6 +686,11 @@
     "\n",
     "The pretrained embeddings above represent both of these meanings of the word 'play' in the same embedding. To overcome this limitation, we need to build embeddings based on the **language model**, which is trained on a large corpus of text, and *knows* how words can be put together in different contexts. Discussing contextual embeddings is out of scope for this tutorial, but we will come back to them when talking about language models in the next unit.\n"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
   }
  ],
  "metadata": {
diff --git a/5-NLP/14-Embeddings/README.md b/5-NLP/14-Embeddings/README.md
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b94abb0f3e7f4a0c88ea9a26494804b2b57fb434 100644
--- a/5-NLP/14-Embeddings/README.md
+++ b/5-NLP/14-Embeddings/README.md
@@ -0,0 +1,41 @@
+# Embeddings
+
+When training classifiers based on BoW or TF/IDF, we operated on high-dimensional bag-of-words vectors with length `vocab_size`, and we were explicitly converting from low-dimensional positional representation vectors into sparse one-hot representation. This one-hot representation is not memory-efficient, in addition, each word is treated independently from each other, i.e. one-hot encoded vectors do not express any semantic similarity between words.
+
+The idea of **embedding** is to represent words by lower-dimensional dense vectors, which somehow reflect semantic meaning of a word. We will later discuss how to build meaningful word embeddings, but for now let's just think of embeddings as a way to lower dimensionality of a word vector. 
+
+So, embedding layer would take a word as an input, and produce an output vector of specified `embedding_size`. In a sense, it is very similar to `Linear` layer, but instead of taking one-hot encoded vector, it will be able to take a word number as an input, allowing us to avoid creating large one-hot-encoded vectors.
+
+By using embedding layer as a first layer in our classifier network, we can switch from bag-or-words to **embedding bag** model, where we first convert each word in our text into corresponding embedding, and then compute some aggregate function over all those embeddings, such as `sum`, `average` or `max`.  
+
+![Image showing an embedding classifier for five sequence words.](images/embedding-classifier-example.png)
+
+## Continue in Notebooks
+
+* [Embeddings with PyTorch](EmbeddingsPyTorch.ipynb)
+* [Embeddings Tensorflow](EmbeddingsTF.ipynb)
+
+## Semantic Embeddings: Word2Vec
+
+While embedding layer learnt to map words to vector representation, however, this representation did not necessarily have much semantical meaning. It would be nice to learn such vector representation that similar words or symonims correspond to vectors that are close to each other in terms of some vector distance (eg. Euclidean distance).
+
+To do that, we need to pre-train our embedding model on a large collection of text in a specific way. One of the first ways to train semantic embeddings is called [Word2Vec](https://en.wikipedia.org/wiki/Word2vec). It is based on two main architectures that are used to produce a distributed representation of words:
+
+ - **Continuous bag-of-words** (CBoW) — in this architecture, we train the model to predict a word from surrounding context. Given the ngram $(W_{-2},W_{-1},W_0,W_1,W_2)$, the goal of the model is to predict $W_0$ from $(W_{-2},W_{-1},W_1,W_2)$.
+ - **Continuous skip-gram** is opposite to CBoW. The model uses surrounding window of context words to predict the current word.
+
+CBoW is faster, while skip-gram is slower, but does a better job of representing infrequent words.
+
+![Image showing both CBoW and Skip-Gram algorithms to convert words to vectors.](./images/example-algorithms-for-converting-words-to-vectors.png)
+
+Word2Vec pre-trained embeddings (as well as other similar models, such as GloVe) can also be used in place of embedding layer in neural networks. However, we need to deal with vocabularies, because the vocabulary used to pre-train Word2Vec/GloVe is likely to differ from the vocabulary in our text corpus. Have a look into Notebooks to see how this problem can be resolved.
+
+## Contextual Embeddings
+
+One key limitation of tradition pretrained embedding representations such as Word2Vec is the problem of word sense disambiguation. While pretrained embeddings can capture some of the meaning of words in context, every possible meaning of a word is encoded into the same embedding. This can cause problems in downstream models, since many words such as the word 'play' have different meanings depending on the context they are used in.
+
+For example word 'play' in those two different sentences have quite different meaning:
+- I went to a **play** at the theature.
+- John wants to **play** with his friends.
+
+The pretrained embeddings above represent both of these meanings of the word 'play' in the same embedding. To overcome this limitation, we need to build embeddings based on the **language model**, which is trained on a large corpus of text, and *knows* how words can be put together in different contexts. Discussing contextual embeddings is out of scope for this tutorial, but we will come back to them when talking about language models later in the course.
diff --git a/5-NLP/16-RNN/README.md b/5-NLP/16-RNN/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0f403d5fce71553aecb08ffc24e3879dabda4979
--- /dev/null
+++ b/5-NLP/16-RNN/README.md
@@ -0,0 +1,58 @@
+# Recurrent Neural Networks
+
+In the previous sections, we have been using rich semantic representations of text, and a simple linear classifier on top of the embeddings. What this architecture does is to capture aggregated meaning of words in a sentence, but it does not take into account the **order** of words, because aggregation operation on top of embeddings removed this information from the original text. Because these models are unable to model word ordering, they cannot solve more complex or ambiguous tasks such as text generation or question answering.
+
+To capture the meaning of text sequence, we need to use another neural network architecture, which is called a **recurrent neural network**, or RNN. In RNN, we pass our sentence through the network one symbol at a time, and the network produces some **state**, which we then pass to the network again with the next symbol.
+
+![RNN](./images/rnn.png)
+
+Given the input sequence of tokens X<sub>0</sub>,...,X<sub>n</sub>, RNN creates a sequence of neural network blocks, and trains this sequence end-to-end using back propagation. Each network block takes a pair (X<sub>i</sub>,S<sub>i</sub>) as an input, and produces S<sub>i+1</sub> as a result. Final state S<sub>n</sub> or (output Y<sub>n</sub>) goes into a linear classifier to produce the result. All network blocks share the same weights, and are trained end-to-end using one backpropagation pass.
+
+Because state vectors S<sub>0</sub>,...,S<sub>n</sub> are passed through the network, it is able to learn the sequential dependencies between words. For example, when the word *not* appears somewhere in the sequence, it can learn to negate certain elements within the state vector, resulting in negation.  
+
+> Since weights of all RNN blocks on the picture are shared, the same picture can be represented as one block (on the right) with a recurrent feedback loop, which passes output state of the network back to the input.
+
+## Anatomy of RNN Cell
+
+Let's see how simple RNN cell is organized. It accepts previous state S<sub>i-1</sub> and current symbol X<sub>i</sub> as inputs, and has to produce output state S<sub>i</sub> (and, sometimes, we are also interested in some other output Y<sub>i</sub>, as in case with generative networks).
+
+Simple RNN cell has two weight matrices inside: one transforms input symbol (let call it W), and another one transforms input state (H). In this case the output of the network is calculated as &sigma;(W&times;X<sub>i</sub>+H&times;S<sub>i-1</sub>+b), where &sigma; is the activation function, b is additional bias.
+
+![RNN Cell Anatomy](images/rnn-anatomy.png)
+
+In many cases, input tokens are passed through the embedding layer before entering the RNN to lower the dimensionality. In this case, if the dimension of the input vectors is *emb_size*, and state vector is *hid_size* - the size of W is *emb_size*&times;*hid_size*, and the size of H is *hid_size*&times;*hid_size*. 
+
+## Long Short Term Memory (LSTM)
+
+One of the main problems of classical RNNs is so-called **vanishing gradients** problem. Because RNNs are trained end-to-end in one back-propagation pass, it is having hard times propagating error to the first layers of the network, and thus the network cannot learn relationships between distant tokens. One of the ways to avoid this problem is to introduce **explicit state management** by using so called **gates**. There are two most known architectures of this kind: **Long Short Term Memory** (LSTM) and **Gated Relay Unit** (GRU).
+
+![Image showing an example long short term memory cell](./images/long-short-term-memory-cell.svg)
+
+LSTM Network is organized in a manner similar to RNN, but there are two states that are being passed from layer to layer: actual state C, and hidden vector H. At each unit, hidden vector H<sub>i</sub> is concatenated with input X<sub>i</sub>, and they control what happens to the state C via **gates**. Each gate is a neural network with sigmoid activation (output in the range [0,1]), which can be thought of as bitwise mask when multiplied by the state vector. There are the following gates (from left to right on the picture above):
+* **forget gate** takes hidden vector and determines, which components of the vector C we need to forget, and which to pass through. 
+* **input gate** takes some information from the input and hidden vector, and inserts it into state.
+* **output gate** transforms state via some linear layer with *tanh* activation, then selects some of its components using hidden vector H<sub>i</sub> to produce new state C<sub>i+1</sub>.
+
+Components of the state C can be thought of as some flags that can be switched on and off. For example, when we encounter a name *Alice* in the sequence, we may want to assume that it refers to female character, and raise the flag in the state that we have female noun in the sentence. When we further encounter phrases *and Tom*, we will raise the flag that we have plural noun. Thus by manipulating state we can supposedly keep track of grammatical properties of sentence parts.
+
+> **Note**: A great resource for understanding internals of LSTM is this great article [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Christopher Olah.
+
+## Bidirectional and multilayer RNNs
+
+We have discussed recurrent networks that operate in one direction, from beginning of a sequence to the end. It looks natural, because it resembles the way we read and listen to speech. However, since in many practical cases we have random access to the input sequence, it might make sense to run recurrent computation in both directions. Such networks are call **bidirectional** RNNs. When dealing with bidirectional network, we would need two hidden state vectors, one for each direction. 
+
+Recurrent network, one-directional or bidirectional, captures certain patterns within a sequence, and can store them into state vector or pass into output. As with convolutional networks, we can build another recurrent layer on top of the first one to capture higher level patterns, build from low-level patterns extracted by the first layer. This leads us to the notion of **multi-layer RNN**, which consists of two or more recurrent networks, where output of the previous layer is passed to the next layer as input.
+
+![Image showing a Multilayer long-short-term-memory- RNN](./images/multi-layer-lstm.jpg)
+
+*Picture from [this wonderful post](https://towardsdatascience.com/from-a-lstm-cell-to-a-multilayer-lstm-network-with-pytorch-2899eb5696f3) by Fernando López*
+
+## Continue to Notebooks
+
+* [RNNs with PyTorch](RNNPyTorch.ipynb)
+* [RNNs with Tensorflow](RNNTF.ipynb)
+
+## RNNs for other tasks
+
+In this unit, we have seen that RNNs can be used for sequence classification, but in fact, they can handle many more tasks, such as text generation, machine translation, and more. We will consider those tasks in the next unit.
+
diff --git a/5-NLP/16-RNN/RNNPyTorch.ipynb b/5-NLP/16-RNN/RNNPyTorch.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..2f7f6d07007c5bb78c4fac049c6975adfba824a1
--- /dev/null
+++ b/5-NLP/16-RNN/RNNPyTorch.ipynb
@@ -0,0 +1,486 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Recurrent neural networks\n",
+    "\n",
+    "In the previous module, we have been using rich semantic representations of text, and a simple linear classifier on top of the embeddings. What this architecture does is to capture aggregated meaning of words in a sentence, but it does not take into account the **order** of words, because aggregation operation on top of embeddings removed this information from the original text. Because these models are unable to model word ordering, they cannot solve more complex or ambiguous tasks such as text generation or question answering.\n",
+    "\n",
+    "To capture the meaning of text sequence, we need to use another neural network architecture, which is called a **recurrent neural network**, or RNN. In RNN, we pass our sentence through the network one symbol at a time, and the network produces some **state**, which we then pass to the network again with the next symbol.\n",
+    "\n",
+    "<img alt=\"RNN\" src=\"images/rnn.png\" width=\"60%\"/>\n",
+    "\n",
+    "Given the input sequence of tokens $X_0,\\dots,X_n$, RNN creates a sequence of neural network blocks, and trains this sequence end-to-end using back propagation. Each network block takes a pair $(X_i,S_i)$ as an input, and produces $S_{i+1}$ as a result. Final state $S_n$ or output $X_n$ goes into a linear classifier to produce the result. All network blocks share the same weights, and are trained end-to-end using one backpropagation pass.\n",
+    "\n",
+    "Because state vectors $S_0,\\dots,S_n$ are passed through the network, it is able to learn the sequential dependencies between words. For example, when the word *not* appears somewhere in the sequence, it can learn to negate certain elements within the state vector, resulting in negation.  \n",
+    "\n",
+    "> Since weights of all RNN blocks on the picture are shared, the same picture can be represented as one block (on the right) with a recurrent feedback loop, which passes output state of the network back to the input.\n",
+    "\n",
+    "Let's see how recurrent neural networks can help us classify our news dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loading dataset...\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "d:\\WORK\\ai-for-beginners\\5-NLP\\16-RNN\\data\\train.csv: 29.5MB [00:01, 28.3MB/s]                            \n",
+      "d:\\WORK\\ai-for-beginners\\5-NLP\\16-RNN\\data\\test.csv: 1.86MB [00:00, 9.72MB/s]                          \n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Building vocab...\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "import torchtext\n",
+    "from torchnlp import *\n",
+    "train_dataset, test_dataset, classes, vocab = load_dataset()\n",
+    "vocab_size = len(vocab)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Simple RNN classifier\n",
+    "\n",
+    "In case of simple RNN, each recurrent unit is a simple linear network, which takes concatenated input vector and state vector, and produce a new state vector. PyTorch represents this unit with `RNNCell` class, and a networks of such cells - as `RNN` layer.\n",
+    "\n",
+    "To define an RNN classifier, we will first apply an embedding layer to lower the dimensionality of input vocabulary, and then have RNN layer on top of it: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class RNNClassifier(torch.nn.Module):\n",
+    "    def __init__(self, vocab_size, embed_dim, hidden_dim, num_class):\n",
+    "        super().__init__()\n",
+    "        self.hidden_dim = hidden_dim\n",
+    "        self.embedding = torch.nn.Embedding(vocab_size, embed_dim)\n",
+    "        self.rnn = torch.nn.RNN(embed_dim,hidden_dim,batch_first=True)\n",
+    "        self.fc = torch.nn.Linear(hidden_dim, num_class)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        batch_size = x.size(0)\n",
+    "        x = self.embedding(x)\n",
+    "        x,h = self.rnn(x)\n",
+    "        return self.fc(x.mean(dim=1))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Note:** We use untrained embedding layer here for simplicity, but for even better results we can use pre-trained embedding layer with Word2Vec or GloVe embeddings, as described in the previous unit. For better understanding, you might want to adapt this code to work with pre-trained embeddings.\n",
+    "\n",
+    "In our case, we will use padded data loader, so each batch will have a number of padded sequences of the same length. RNN layer will take the sequence of embedding tensors, and produce two outputs: \n",
+    "* $x$ is a sequence of RNN cell outputs at each step\n",
+    "* $h$ is a final hidden state for the last element of the sequence\n",
+    "\n",
+    "We then apply a fully-connected linear classifier to get the number of class.\n",
+    "\n",
+    "> **Note:** RNNs are quite difficult to train, because once the RNN cells are unrolled along the sequence length, the resulting number of layers involved in back propagation is quite large. Thus we need to select small learning rate, and train the network on larger dataset to produce good results. It can take quite a long time, so using GPU is preferred."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "3200: acc=0.3090625\n",
+      "6400: acc=0.38921875\n",
+      "9600: acc=0.4590625\n",
+      "12800: acc=0.511953125\n",
+      "16000: acc=0.5506875\n",
+      "19200: acc=0.57921875\n",
+      "22400: acc=0.6070089285714285\n",
+      "25600: acc=0.6304296875\n",
+      "28800: acc=0.6484027777777778\n",
+      "32000: acc=0.66509375\n",
+      "35200: acc=0.6790056818181818\n",
+      "38400: acc=0.6929166666666666\n",
+      "41600: acc=0.7035817307692308\n",
+      "44800: acc=0.7137276785714286\n",
+      "48000: acc=0.72225\n",
+      "51200: acc=0.73001953125\n",
+      "54400: acc=0.7372794117647059\n",
+      "57600: acc=0.7436631944444444\n",
+      "60800: acc=0.7503947368421052\n",
+      "64000: acc=0.75634375\n",
+      "67200: acc=0.7615773809523809\n",
+      "70400: acc=0.7662642045454545\n",
+      "73600: acc=0.7708423913043478\n",
+      "76800: acc=0.7751822916666666\n",
+      "80000: acc=0.7790625\n",
+      "83200: acc=0.7825\n",
+      "86400: acc=0.7858564814814815\n",
+      "89600: acc=0.7890513392857142\n",
+      "92800: acc=0.7920474137931034\n",
+      "96000: acc=0.7952708333333334\n",
+      "99200: acc=0.7982258064516129\n",
+      "102400: acc=0.80099609375\n",
+      "105600: acc=0.8037594696969697\n",
+      "108800: acc=0.8060569852941176\n"
+     ]
+    }
+   ],
+   "source": [
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=16, collate_fn=padify, shuffle=True)\n",
+    "net = RNNClassifier(vocab_size,64,32,len(classes)).to(device)\n",
+    "train_epoch(net,train_loader, lr=0.001)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Long Short Term Memory (LSTM)\n",
+    "\n",
+    "One of the main problems of classical RNNs is so-called **vanishing gradients** problem. Because RNNs are trained end-to-end in one back-propagation pass, it is having hard times propagating error to the first layers of the network, and thus the network cannot learn relationships between distant tokens. One of the ways to avoid this problem is to introduce **explicit state management** by using so called **gates**. There are two most known architectures of this kind: **Long Short Term Memory** (LSTM) and **Gated Relay Unit** (GRU).\n",
+    "\n",
+    "![Image showing an example long short term memory cell](./images/long-short-term-memory-cell.svg)\n",
+    "\n",
+    "LSTM Network is organized in a manner similar to RNN, but there are two states that are being passed from layer to layer: actual state $c$, and hidden vector $h$. At each unit, hidden vector $h_i$ is concatenated with input $x_i$, and they control what happens to the state $c$ via **gates**. Each gate is a neural network with sigmoid activation (output in the range $[0,1]$), which can be thought of as bitwise mask when multiplied by the state vector. There are the following gates (from left to right on the picture above):\n",
+    "* **forget gate** takes hidden vector and determines, which components of the vector $c$ we need to forget, and which to pass through. \n",
+    "* **input gate** takes some information from the input and hidden vector, and inserts it into state.\n",
+    "* **output gate** transforms state via some linear layer with $\\tanh$ activation, then selects some of its components using hidden vector $h_i$ to produce new state $c_{i+1}$.\n",
+    "\n",
+    "Components of the state $c$ can be thought of as some flags that can be switched on and off. For example, when we encounter a name *Alice* in the sequence, we may want to assume that it refers to female character, and raise the flag in the state that we have female noun in the sentence. When we further encounter phrases *and Tom*, we will raise the flag that we have plural noun. Thus by manipulating state we can supposedly keep track of grammatical properties of sentence parts.\n",
+    "\n",
+    "> **Note**: A great resource for understanding internals of LSTM is this great article [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Christopher Olah.\n",
+    "\n",
+    "While internal structure of LSTM cell may look complex, PyTorch hides this implementation inside `LSTMCell` class, and provides `LSTM` object to represent the whole LSTM layer. Thus, implementation of LSTM classifier will be pretty similar to the simple RNN which we have seen above:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LSTMClassifier(torch.nn.Module):\n",
+    "    def __init__(self, vocab_size, embed_dim, hidden_dim, num_class):\n",
+    "        super().__init__()\n",
+    "        self.hidden_dim = hidden_dim\n",
+    "        self.embedding = torch.nn.Embedding(vocab_size, embed_dim)\n",
+    "        self.embedding.weight.data = torch.randn_like(self.embedding.weight.data)-0.5\n",
+    "        self.rnn = torch.nn.LSTM(embed_dim,hidden_dim,batch_first=True)\n",
+    "        self.fc = torch.nn.Linear(hidden_dim, num_class)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        batch_size = x.size(0)\n",
+    "        x = self.embedding(x)\n",
+    "        x,(h,c) = self.rnn(x)\n",
+    "        return self.fc(h[-1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's train our network. Note that training LSTM is also quite slow, and you may not seem much raise in accuracy in the beginning of training. Also, you may need to play with `lr` learning rate parameter to find the learning rate that results in reasonable training speed, and yet does not cause "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "3200: acc=0.259375\n",
+      "6400: acc=0.25859375\n",
+      "9600: acc=0.26177083333333334\n",
+      "12800: acc=0.2784375\n",
+      "16000: acc=0.313\n",
+      "19200: acc=0.3528645833333333\n",
+      "22400: acc=0.3965625\n",
+      "25600: acc=0.4385546875\n",
+      "28800: acc=0.4752777777777778\n",
+      "32000: acc=0.505375\n",
+      "35200: acc=0.5326704545454546\n",
+      "38400: acc=0.5557552083333334\n",
+      "41600: acc=0.5760817307692307\n",
+      "44800: acc=0.5954910714285714\n",
+      "48000: acc=0.6118333333333333\n",
+      "51200: acc=0.62681640625\n",
+      "54400: acc=0.6404779411764706\n",
+      "57600: acc=0.6520138888888889\n",
+      "60800: acc=0.662828947368421\n",
+      "64000: acc=0.673546875\n",
+      "67200: acc=0.6831547619047619\n",
+      "70400: acc=0.6917897727272727\n",
+      "73600: acc=0.6997146739130434\n",
+      "76800: acc=0.707109375\n",
+      "80000: acc=0.714075\n",
+      "83200: acc=0.7209134615384616\n",
+      "86400: acc=0.727037037037037\n",
+      "89600: acc=0.7326674107142858\n",
+      "92800: acc=0.7379633620689655\n",
+      "96000: acc=0.7433645833333333\n",
+      "99200: acc=0.7479032258064516\n",
+      "102400: acc=0.752119140625\n",
+      "105600: acc=0.7562405303030303\n",
+      "108800: acc=0.76015625\n",
+      "112000: acc=0.7641339285714286\n",
+      "115200: acc=0.7677777777777778\n",
+      "118400: acc=0.7711233108108108\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "(0.03487814127604167, 0.7728)"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "net = LSTMClassifier(vocab_size,64,32,len(classes)).to(device)\n",
+    "train_epoch(net,train_loader, lr=0.001)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Packed sequences\n",
+    "\n",
+    "In our example, we had to pad all sequences in the minibatch with zero vectors. While it results in some memory waste, with RNNs it is more critical that additional RNN cells are created for the padded input items, which take part in training, yet do not carry any important input information. It would be much better to train RNN only to the actual sequence size.\n",
+    "\n",
+    "To do that, a special format of padded sequence storage is introduced in PyTorch. Suppose we have input padded minibatch which looks like this:\n",
+    "```\n",
+    "[[1,2,3,4,5],\n",
+    " [6,7,8,0,0],\n",
+    " [9,0,0,0,0]]\n",
+    "```\n",
+    "Here 0 represents padded values, and the actual length vector of input sequences is `[5,3,1]`.\n",
+    "\n",
+    "In order to effectively train RNN with padded sequence, we want to begin training first group of RNN cells with large minibatch (`[1,6,9]`), but then end processing of third sequence, and continue training with shorted minibatches (`[2,7]`, `[3,8]`), and so on. Thus, packed sequence is represented as one vector - in our case `[1,6,9,2,7,3,8,4,5]`, and length vector (`[5,3,1]`), from which we can easily reconstruct the original padded minibatch.\n",
+    "\n",
+    "To produce packed sequence, we can use `torch.nn.utils.rnn.pack_padded_sequence` function. All recurrent layers, including RNN, LSTM and GRU, support packed sequences as input, and produce packed output, which can be decoded using `torch.nn.utils.rnn.pad_packed_sequence`.\n",
+    "\n",
+    "To be able to produce packed sequence, we need to pass length vector to the network, and thus we need a different function to prepare minibatches:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def pad_length(b):\n",
+    "    # build vectorized sequence\n",
+    "    v = [encode(x[1]) for x in b]\n",
+    "    # compute max length of a sequence in this minibatch and length sequence itself\n",
+    "    len_seq = list(map(len,v))\n",
+    "    l = max(len_seq)\n",
+    "    return ( # tuple of three tensors - labels, padded features, length sequence\n",
+    "        torch.LongTensor([t[0]-1 for t in b]),\n",
+    "        torch.stack([torch.nn.functional.pad(torch.tensor(t),(0,l-len(t)),mode='constant',value=0) for t in v]),\n",
+    "        torch.tensor(len_seq)\n",
+    "    )\n",
+    "\n",
+    "train_loader_len = torch.utils.data.DataLoader(train_dataset, batch_size=16, collate_fn=pad_length, shuffle=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Actual network would be very similar to `LSTMClassifier` above, but `forward` pass will receive both padded minibatch and the vector of sequence lengths. After computing the embedding, we compute packed sequence, pass it to LSTM layer, and then unpack the result back.\n",
+    "\n",
+    "> **Note**: We actually do not use unpacked result `x`, because we use output from the hidden layers in the following computations. Thus, we can remove the unpacking altogether from this code. The reason we place it here is for you to be able to modify this code easily, in case you should need to use network output in further computations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LSTMPackClassifier(torch.nn.Module):\n",
+    "    def __init__(self, vocab_size, embed_dim, hidden_dim, num_class):\n",
+    "        super().__init__()\n",
+    "        self.hidden_dim = hidden_dim\n",
+    "        self.embedding = torch.nn.Embedding(vocab_size, embed_dim)\n",
+    "        self.embedding.weight.data = torch.randn_like(self.embedding.weight.data)-0.5\n",
+    "        self.rnn = torch.nn.LSTM(embed_dim,hidden_dim,batch_first=True)\n",
+    "        self.fc = torch.nn.Linear(hidden_dim, num_class)\n",
+    "\n",
+    "    def forward(self, x, lengths):\n",
+    "        batch_size = x.size(0)\n",
+    "        x = self.embedding(x)\n",
+    "        pad_x = torch.nn.utils.rnn.pack_padded_sequence(x,lengths,batch_first=True,enforce_sorted=False)\n",
+    "        pad_x,(h,c) = self.rnn(pad_x)\n",
+    "        x, _ = torch.nn.utils.rnn.pad_packed_sequence(pad_x,batch_first=True)\n",
+    "        return self.fc(h[-1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's do the training:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "3200: acc=0.285625\n",
+      "6400: acc=0.33359375\n",
+      "9600: acc=0.3876041666666667\n",
+      "12800: acc=0.44078125\n",
+      "16000: acc=0.4825\n",
+      "19200: acc=0.5235416666666667\n",
+      "22400: acc=0.5559821428571429\n",
+      "25600: acc=0.58609375\n",
+      "28800: acc=0.6116666666666667\n",
+      "32000: acc=0.63340625\n",
+      "35200: acc=0.6525284090909091\n",
+      "38400: acc=0.668515625\n",
+      "41600: acc=0.6822596153846154\n",
+      "44800: acc=0.6948214285714286\n",
+      "48000: acc=0.7052708333333333\n",
+      "51200: acc=0.71521484375\n",
+      "54400: acc=0.7239889705882353\n",
+      "57600: acc=0.7315277777777778\n",
+      "60800: acc=0.7388486842105263\n",
+      "64000: acc=0.74571875\n",
+      "67200: acc=0.7518303571428572\n",
+      "70400: acc=0.7576988636363636\n",
+      "73600: acc=0.7628940217391305\n",
+      "76800: acc=0.7681510416666667\n",
+      "80000: acc=0.7728125\n",
+      "83200: acc=0.7772235576923077\n",
+      "86400: acc=0.7815393518518519\n",
+      "89600: acc=0.7857700892857142\n",
+      "92800: acc=0.7895043103448276\n",
+      "96000: acc=0.7930520833333333\n",
+      "99200: acc=0.7959072580645161\n",
+      "102400: acc=0.798994140625\n",
+      "105600: acc=0.802064393939394\n",
+      "108800: acc=0.8051378676470589\n",
+      "112000: acc=0.8077857142857143\n",
+      "115200: acc=0.8104600694444445\n",
+      "118400: acc=0.8128293918918919\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "(0.029785829671223958, 0.8138166666666666)"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "net = LSTMPackClassifier(vocab_size,64,32,len(classes)).to(device)\n",
+    "train_epoch_emb(net,train_loader_len, lr=0.001,use_pack_sequence=True)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Note:** You may have noticed the parameter `use_pack_sequence` that we pass to the training function. Currently, `pack_padded_sequence` function requires length sequence tensor to be on CPU device, and thus training function needs to avoid moving the length sequence data to GPU when training. You can look into implementation of `train_emb` function in the [`torchnlp.py`](torchnlp.py) file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Bidirectional and multilayer RNNs\n",
+    "\n",
+    "In our examples, all recurrent networks operated in one direction, from beginning of a sequence to the end. It looks natural, because it resembles the way we read and listen to speech. However, since in many practical cases we have random access to the input sequence, it might make sense to run recurrent computation in both directions. Such networks are call **bidirectional** RNNs, and they can be created by passing `bidirectional=True` parameter to RNN/LSTM/GRU constructor.\n",
+    "\n",
+    "When dealing with bidirectional network, we would need two hidden state vectors, one for each direction. PyTorch encodes those vectors as one vector of twice larger size, which is quite convenient, because you would normally pass the resulting hidden state to fully-connected linear layer, and you would just need to take this increase in size into account when creating the layer.\n",
+    "\n",
+    "Recurrent network, one-directional or bidirectional, captures certain patterns within a sequence, and can store them into state vector or pass into output. As with convolutional networks, we can build another recurrent layer on top of the first one to capture higher level patterns, build from low-level patterns extracted by the first layer. This leads us to the notion of **multi-layer RNN**, which consists of two or more recurrent networks, where output of the previous layer is passed to the next layer as input.\n",
+    "\n",
+    "![Image showing a Multilayer long-short-term-memory- RNN](images/multi-layer-lstm.jpg)\n",
+    "\n",
+    "*Picture from [this wonderful post](https://towardsdatascience.com/from-a-lstm-cell-to-a-multilayer-lstm-network-with-pytorch-2899eb5696f3) by Fernando López*\n",
+    "\n",
+    "PyTorch makes constructing such networks an easy task, because you just need to pass `num_layers` parameter to RNN/LSTM/GRU constructor to build several layers of recurrence automatically. This would also mean that the size of hidden/state vector would increase proportionally, and you would need to take this into account when handling the output of recurrent layers."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## RNNs for other tasks\n",
+    "\n",
+    "In this unit, we have seen that RNNs can be used for sequence classification, but in fact, they can handle many more tasks, such as text generation, machine translation, and more. We will consider those tasks in the next unit."
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "0cb620c6d4b9f7a635928804c26cf22403d89d98d79684e4529119355ee6d5a5"
+  },
+  "kernelspec": {
+   "display_name": "py37_pytorch",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/5-NLP/16-RNN/RNNTF.ipynb b/5-NLP/16-RNN/RNNTF.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..c461cedc415eeee861d394c91c3bce324c93af4b
--- /dev/null
+++ b/5-NLP/16-RNN/RNNTF.ipynb
@@ -0,0 +1,443 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Recurrent neural networks\n",
+        "\n",
+        "In the previous module, we covered rich semantic representations of text. The architecture we've been using captures the aggregated meaning of words in a sentence, but it does not take into account the **order** of the words, because the aggregation operation that follows the embeddings removes this information from the original text. Because these models are unable to represent word ordering, they cannot solve more complex or ambiguous tasks such as text generation or question answering.\n",
+        "\n",
+        "To capture the meaning of a text sequence, we'll use a neural network architecture called **recurrent neural network**, or RNN. When using an RNN, we pass our sentence through the network one token at a time, and the network produces some **state**, which we then pass to the network again with the next token.\n",
+        "\n",
+        "![Image showing an example recurrent neural network generation.](images/rnn.png)\n",
+        "\n",
+        "Given the input sequence of tokens $X_0,\\dots,X_n$, the RNN creates a sequence of neural network blocks, and trains this sequence end-to-end using backpropagation. Each network block takes a pair $(X_i,S_i)$ as an input, and produces $S_{i+1}$ as a result. The final state $S_n$ or output $Y_n$ goes into a linear classifier to produce the result. All network blocks share the same weights, and are trained end-to-end using one backpropagation pass.\n",
+        "\n",
+        "> The figure above shows recurrent neural network in the unrolled form (on the left), and in more compact recurrent representation (on the right). It is important to realize that all RNN Cells have the same **shareable weights**.\n",
+        "\n",
+        "Because state vectors $S_0,\\dots,S_n$ are passed through the network, the RNN is able to learn sequential dependencies between words. For example, when the word *not* appears somewhere in the sequence, it can learn to negate certain elements within the state vector.\n",
+        "\n",
+        "Inside, each RNN cell contains two weight matrices: $W_H$ and $W_I$, and bias $b$. At each RNN step, given input $X_i$ and input state $S_i$, output state is calculated as $S_{i+1} = f(W_H\\times S_i + W_I\\times X_i+b)$, where $f$ is an activation function (often $\\tanh$).\n",
+        "\n",
+        "> For problems like text generation (that we will cover in the next unit) or machine translation we also want to get some output value at each RNN step. In this case, there is also another matrix $W_O$, and output is caluclated as $Y_i=f(W_O\\times S_i+b_O)$.\n",
+        "\n",
+        "Let's see how recurrent neural networks can help us classify our news dataset.\n",
+        "\n",
+        "> For the sandbox environment, we need to run the following cell to make sure the required library is installed, and data is prefetched. If you are running locally, you can skip the following cell."
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import sys\n",
+        "!{sys.executable} -m pip install --quiet tensorflow_datasets==4.4.0\n",
+        "!cd ~ && wget -q -O - https://mslearntensorflowlp.blob.core.windows.net/data/tfds-ag-news.tgz | tar xz"
+      ],
+      "outputs": [],
+      "execution_count": 1,
+      "metadata": {}
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import tensorflow as tf\n",
+        "from tensorflow import keras\n",
+        "import tensorflow_datasets as tfds\n",
+        "import numpy as np\n",
+        "\n",
+        "# We are going to be training pretty large models. In order not to face errors, we need\n",
+        "# to set tensorflow option to grow GPU memory allocation when required\n",
+        "physical_devices = tf.config.list_physical_devices('GPU') \n",
+        "if len(physical_devices)>0:\n",
+        "    tf.config.experimental.set_memory_growth(physical_devices[0], True)\n",
+        "\n",
+        "ds_train, ds_test = tfds.load('ag_news_subset').values()"
+      ],
+      "outputs": [],
+      "execution_count": 2,
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "When training large models, GPU memory allocation may become a problem. We also may need to experiment with different minibatch sizes, so that the data fits into our GPU memory, yet the training is fast enough. If you are running this code on your own GPU machine, you may experiment with adjusting minibatch size to speed up training.\r\n",
+        "\r\n",
+        "> **Note**: Certain versions of NVidia drivers are known not to release the memory after training the model. We are running several examples in this notebooks, and it might cause memory to be exhausted in certain setups, especially if you are doing your own experiments as part of the same notebook. If you encounter some weird errors when starting to train the model, you may want to restart notebook kernel."
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "batch_size = 16\r\n",
+        "embed_size = 64"
+      ],
+      "outputs": [],
+      "execution_count": 3,
+      "metadata": {
+        "collapsed": true,
+        "jupyter": {
+          "source_hidden": false,
+          "outputs_hidden": false
+        },
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Simple RNN classifier\n",
+        "\n",
+        "In the case of a simple RNN, each recurrent unit is a simple linear network, which takes in an input vector and state vector, and produces a new state vector. In Keras, this can be represented by the `SimpleRNN` layer.\n",
+        "\n",
+        "While we can pass one-hot encoded tokens to the RNN layer directly, this is not a good idea because of their high dimensionality. Therefore, we will use an embedding layer to lower the dimensionality of word vectors, followed by an RNN layer, and finally a `Dense` classifier.\n",
+        "\n",
+        "> **Note**: In cases where the dimensionality isn't so high, for example when using character-level tokenization, it might make sense to pass one-hot encoded tokens directly into the RNN cell."
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "vocab_size = 20000\n",
+        "\n",
+        "vectorizer = keras.layers.experimental.preprocessing.TextVectorization(\n",
+        "    max_tokens=vocab_size,\n",
+        "    input_shape=(1,))\n",
+        "\n",
+        "model = keras.models.Sequential([\n",
+        "    vectorizer,\n",
+        "    keras.layers.Embedding(vocab_size, embed_size),\n",
+        "    keras.layers.SimpleRNN(16),\n",
+        "    keras.layers.Dense(4,activation='softmax')\n",
+        "])\n",
+        "\n",
+        "model.summary()"
+      ],
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Model: \"sequential\"\n",
+            "_________________________________________________________________\n",
+            "Layer (type)                 Output Shape              Param #   \n",
+            "=================================================================\n",
+            "text_vectorization (TextVect (None, None)              0         \n",
+            "_________________________________________________________________\n",
+            "embedding (Embedding)        (None, None, 64)          1280000   \n",
+            "_________________________________________________________________\n",
+            "simple_rnn (SimpleRNN)       (None, 16)                1296      \n",
+            "_________________________________________________________________\n",
+            "dense (Dense)                (None, 4)                 68        \n",
+            "=================================================================\n",
+            "Total params: 1,281,364\n",
+            "Trainable params: 1,281,364\n",
+            "Non-trainable params: 0\n",
+            "_________________________________________________________________\n"
+          ]
+        }
+      ],
+      "execution_count": 4,
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "> **Note:** We use an untrained embedding layer here for simplicity, but for better results we can use a pretrained embedding layer using Word2Vec, as described in the previous unit. It would be a good exercise for you to adapt this code to work with pretrained embeddings.\n",
+        "\n",
+        "Now let's train our RNN. RNNs in general are quite difficult to train, because once the RNN cells are unrolled along the sequence length, the resulting number of layers involved in backpropagation is quite large. Thus we need to select a smaller learning rate, and train the network on a larger dataset to produce good results. This can take quite a long time, so using a GPU is preferred.\n",
+        "\n",
+        "To speed things up, we will only train the RNN model on news titles, omitting the description. You can try training with description and see if you can get the model to train."
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def extract_title(x):\n",
+        "    return x['title']\n",
+        "\n",
+        "def tupelize_title(x):\n",
+        "    return (extract_title(x),x['label'])\n",
+        "\n",
+        "print('Training vectorizer')\n",
+        "vectorizer.adapt(ds_train.take(2000).map(extract_title))"
+      ],
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Training vectorizer\n"
+          ]
+        }
+      ],
+      "execution_count": 5,
+      "metadata": {
+        "scrolled": true
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "model.compile(loss='sparse_categorical_crossentropy',metrics=['acc'], optimizer='adam')\n",
+        "model.fit(ds_train.map(tupelize_title).batch(batch_size),validation_data=ds_test.map(tupelize_title).batch(batch_size))"
+      ],
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "7500/7500 [==============================] - 82s 11ms/step - loss: 0.6629 - acc: 0.7623 - val_loss: 0.5559 - val_acc: 0.7995\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n"
+          ]
+        },
+        {
+          "output_type": "execute_result",
+          "execution_count": 6,
+          "data": {
+            "text/plain": "<tensorflow.python.keras.callbacks.History at 0x7f3e0030d350>"
+          },
+          "metadata": {}
+        }
+      ],
+      "execution_count": 6,
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "> **Note** that accuracy is likely to be lower here, because we are training only on news titles."
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Revisiting variable sequences \n",
+        "\n",
+        "Remember that the `TextVectorization` layer will automatically pad sequences of variable length in a minibatch with pad tokens. It turns out that those tokens also take part in training, and they can complicate convergence of the model.\n",
+        "\n",
+        "There are several approaches we can take to minimize the amount of padding. One of them is to reorder the dataset by sequence length and group all sequences by size. This can be done using the `tf.data.experimental.bucket_by_sequence_length` function (see [documentation](https://www.tensorflow.org/api_docs/python/tf/data/experimental/bucket_by_sequence_length)). \n",
+        "\n",
+        "Another approach is to use **masking**. In Keras, some layers support additional input that shows which tokens should be taken into account when training. To incorporate masking into our model, we can either include a separate `Masking` layer ([docs](https://keras.io/api/layers/core_layers/masking/)), or we can specify the `mask_zero=True` parameter of our `Embedding` layer.\n",
+        "\n",
+        "> **Note**: This training will take around 5 minutes to complete one epoch on the whole dataset. Feel free to interrupt training at any time if you run out of patience. What you can also do is limit the amount of data used for training, by adding `.take(...)` clause after `ds_train` and `ds_test` datasets."
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def extract_text(x):\n",
+        "    return x['title']+' '+x['description']\n",
+        "\n",
+        "def tupelize(x):\n",
+        "    return (extract_text(x),x['label'])\n",
+        "\n",
+        "model = keras.models.Sequential([\n",
+        "    vectorizer,\n",
+        "    keras.layers.Embedding(vocab_size,embed_size,mask_zero=True),\n",
+        "    keras.layers.SimpleRNN(16),\n",
+        "    keras.layers.Dense(4,activation='softmax')\n",
+        "])\n",
+        "\n",
+        "model.compile(loss='sparse_categorical_crossentropy',metrics=['acc'], optimizer='adam')\n",
+        "model.fit(ds_train.map(tupelize).batch(batch_size),validation_data=ds_test.map(tupelize).batch(batch_size))"
+      ],
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "7500/7500 [==============================] - 371s 49ms/step - loss: 0.5401 - acc: 0.8079 - val_loss: 0.3780 - val_acc: 0.8822\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n"
+          ]
+        },
+        {
+          "output_type": "execute_result",
+          "execution_count": 7,
+          "data": {
+            "text/plain": "<tensorflow.python.keras.callbacks.History at 0x7f3dec118850>"
+          },
+          "metadata": {}
+        }
+      ],
+      "execution_count": 7,
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now that we're using masking, we can train the model on the whole dataset of titles and descriptions.\r\n",
+        "\r\n",
+        "> **Note**: Have you noticed that we have been using vectorizer trained on the news titles, and not the whole body of the article? Potentially, this can cause some of the the tokens to be ignored, so it is better to re-train the vectorizer. However, it might only have very small effect, so we will stick to the previous pre-trained vectorizer for the sake of simplicity."
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## LSTM: Long short-term memory\n",
+        "\n",
+        "One of the main problems of RNNs is **vanishing gradients**. RNNs can be pretty long, and may have a hard time propagating the gradients all the way back to the first layer of the network during backpropagation. When this happens, the network cannot learn relationships between distant tokens. One way to avoid this problem is to introduce **explicit state management** by using **gates**. The two most common architectures that introduce gates are **long short-term memory** (LSTM) and **gated relay unit** (GRU). We'll cover LSTMs here.\n",
+        "\n",
+        "![Image showing an example long short term memory cell](images/long-short-term-memory-cell.svg)\n",
+        "\n",
+        "An LSTM network is organized in a manner similar to an RNN, but there are two states that are passed from layer to layer: the actual state $c$, and the hidden vector $h$. At each unit, the hidden vector $h_{t-1}$ is combined with input $x_t$, and together they control what happens to the state $c_t$ and output $h_{t}$ through **gates**. Each gate has sigmoid activation (output in the range $[0,1]$), which can be thought of as a bitwise mask when multiplied by the state vector. LSTMs have the following gates (from left to right on the picture above):\n",
+        "* **forget gate** which determines which components of the vector $c_{t-1}$ we need to forget, and which to pass through. \n",
+        "* **input gate** which determines how much information from the input vector and previous hidden vector should be incorporated into the state vector.\n",
+        "* **output gate** which takes the new state vector and decides which of its components will be used to produce the new hidden vector $h_t$.\n",
+        "\n",
+        "The components of the state $c$ can be thought of as flags that can be switched on and off. For example, when we encounter the name *Alice* in the sequence, we guess that it refers to a woman, and raise the flag in the state that says we have a female noun in the sentence. When we further encounter the words *and Tom*, we will raise the flag that says we have a plural noun. Thus by manipulating state we can keep track of the grammatical properties of the sentence.\n",
+        "\n",
+        "> **Note**: Here's a great resource for understanding the internals of LSTMs: [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Christopher Olah.\n",
+        "\n",
+        "While the internal structure of an LSTM cell may look complex, Keras hides this implementation inside the `LSTM` layer, so the only thing we need to do in the example above is to replace the recurrent layer:"
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "model = keras.models.Sequential([\n",
+        "    vectorizer,\n",
+        "    keras.layers.Embedding(vocab_size, embed_size),\n",
+        "    keras.layers.LSTM(8),\n",
+        "    keras.layers.Dense(4,activation='softmax')\n",
+        "])\n",
+        "\n",
+        "model.compile(loss='sparse_categorical_crossentropy',metrics=['acc'], optimizer='adam')\n",
+        "model.fit(ds_train.map(tupelize).batch(8),validation_data=ds_test.map(tupelize).batch(8))"
+      ],
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "15000/15000 [==============================] - 188s 13ms/step - loss: 0.5692 - acc: 0.7916 - val_loss: 0.3441 - val_acc: 0.8870\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n"
+          ]
+        },
+        {
+          "output_type": "execute_result",
+          "execution_count": 8,
+          "data": {
+            "text/plain": "<tensorflow.python.keras.callbacks.History at 0x7f3d6af5c350>"
+          },
+          "metadata": {}
+        }
+      ],
+      "execution_count": 8,
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "> **Note** that training LSTMs is also quite slow, and you may not seem much increase in accuracy in the beginning of training. You may need to continue training for some time to achieve good accuracy."
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Bidirectional and multilayer RNNs\n",
+        "\n",
+        "In our examples so far, the recurrent networks operate from the beginning of a sequence until the end. This feels natural to us because it follows the same direction in which we read or listen to speech. However, for scenarios which require random access of the input sequence, it makes more sense to run the recurrent computation in both directions. RNNs that allow computations in both directions are called **bidirectional** RNNs, and they can be created by wrapping the recurrent layer with a special `Bidirectonal` layer.\n",
+        "\n",
+        "> **Note**: The `Bidirectional` layer makes two copies of the layer within it, and sets the `go_backwards` property of one of those copies to `True`, making it go in the opposite direction along the sequence.\n",
+        "\n",
+        "Recurrent networks, unidirectional or bidirectional, capture patterns within a sequence, and store them into state vectors or return them as output. As with convolutional networks, we can build another recurrent layer following the first one to capture higher level patterns, built from lower level patterns extracted by the first layer. This leads us to the notion of a **multi-layer RNN**, which consists of two or more recurrent networks, where the output of the previous layer is passed to the next layer as input.\n",
+        "\n",
+        "![Image showing a Multilayer long-short-term-memory- RNN](images/multi-layer-lstm.jpg)\n",
+        "\n",
+        "*Picture from [this wonderful post](https://towardsdatascience.com/from-a-lstm-cell-to-a-multilayer-lstm-network-with-pytorch-2899eb5696f3) by Fernando López.*\n",
+        "\n",
+        "Keras makes constructing these networks an easy task, because you just need to add more recurrent layers to the model. For all layers except the last one, we need to specify `return_sequences=True` parameter, because we need the layer to return all intermediate states, and not just the final state of the recurrent computation.\n",
+        "\n",
+        "Let's build a two-layer bidirectional LSTM for our classification problem.\n",
+        "\n",
+        "> **Note** this code again takes quite a long time to complete, but it gives us highest accuracy we have seen so far. So maybe it is worth waiting and seeing the result."
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "model = keras.models.Sequential([\n",
+        "    vectorizer,\n",
+        "    keras.layers.Embedding(vocab_size, 128, mask_zero=True),\n",
+        "    keras.layers.Bidirectional(keras.layers.LSTM(64,return_sequences=True)),\n",
+        "    keras.layers.Bidirectional(keras.layers.LSTM(64)),    \n",
+        "    keras.layers.Dense(4,activation='softmax')\n",
+        "])\n",
+        "\n",
+        "model.compile(loss='sparse_categorical_crossentropy',metrics=['acc'], optimizer='adam')\n",
+        "model.fit(ds_train.map(tupelize).batch(batch_size),\n",
+        "          validation_data=ds_test.map(tupelize).batch(batch_size))"
+      ],
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "5044/7500 [===================>..........] - ETA: 2:33 - loss: 0.3709 - acc: 0.8706\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r5045/7500 [===================>..........] - ETA: 2:33 - loss: 0.3709 - acc: 0.8706"
+          ]
+        }
+      ],
+      "execution_count": 9,
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## RNNs for other tasks\n",
+        "\n",
+        "Up until now, we've focused on using RNNs to classify sequences of text. But they can handle many more tasks, such as text generation and machine translation &mdash; we'll consider those tasks in the next unit."
+      ],
+      "metadata": {}
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "name": "conda-env-py37_tensorflow-py",
+      "language": "python",
+      "display_name": "py37_tensorflow"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.7.9",
+      "mimetype": "text/x-python",
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "pygments_lexer": "ipython3",
+      "nbconvert_exporter": "python",
+      "file_extension": ".py"
+    },
+    "kernel_info": {
+      "name": "conda-env-py37_tensorflow-py"
+    },
+    "nteract": {
+      "version": "nteract-front-end@1.0.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/5-NLP/16-RNN/images/long-short-term-memory-cell.svg b/5-NLP/16-RNN/images/long-short-term-memory-cell.svg
new file mode 100644
index 0000000000000000000000000000000000000000..7b66a2c37a0871e5529c563bc63c305ee8ce5a6c
--- /dev/null
+++ b/5-NLP/16-RNN/images/long-short-term-memory-cell.svg
@@ -0,0 +1,1334 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   width="190mm"
+   height="130mm"
+   viewBox="0 0 190 130"
+   version="1.1"
+   id="svg8"
+   sodipodi:docname="lstm-cell.svg"
+   inkscape:version="0.92.3 (2405546, 2018-03-11)"
+   inkscape:export-filename="/home/users_home/Documents/GIT/FINAL_DL/Linear-Attention-Recurrent-Neural-Network/inkscape_drawings/png_exported/lstm-cell.png"
+   inkscape:export-xdpi="299.91537"
+   inkscape:export-ydpi="299.91537">
+  <title
+     id="title1438">LSTM Cell</title>
+  <defs
+     id="defs2">
+    <inkscape:perspective
+       sodipodi:type="inkscape:persp3d"
+       inkscape:vp_x="-405.59802 : 195.57742 : 1"
+       inkscape:vp_y="0 : 999.99994 : 0"
+       inkscape:vp_z="303.85994 : 427.17668 : 1"
+       inkscape:persp3d-origin="105 : 98.999994 : 1"
+       id="perspective4559" />
+    <inkscape:perspective
+       sodipodi:type="inkscape:persp3d"
+       inkscape:vp_x="19.240475 : 198.74312 : 1"
+       inkscape:vp_y="0 : 129.29024 : 0"
+       inkscape:vp_z="110.96647 : 228.68665 : 1"
+       inkscape:persp3d-origin="85.255818 : 186.2566 : 1"
+       id="perspective4559-3" />
+    <filter
+       style="color-interpolation-filters:sRGB"
+       inkscape:label="Blur"
+       id="filter6288">
+      <feGaussianBlur
+         stdDeviation="67.2 10"
+         result="blur"
+         id="feGaussianBlur6286" />
+    </filter>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter6777"
+       x="-0.02178747"
+       width="1.0435749"
+       y="-0.04814931"
+       height="1.0962986">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.90655222"
+         id="feGaussianBlur6779" />
+    </filter>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter8042"
+       x="-0.42381817"
+       width="1.8476363"
+       y="-0.71723074"
+       height="2.4344616">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.25697656"
+         id="feGaussianBlur8044" />
+    </filter>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter10264"
+       x="-0.0079611773"
+       width="1.0159224"
+       y="-0.024356319"
+       height="1.0487126">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.20177668"
+         id="feGaussianBlur10266" />
+    </filter>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter10272"
+       x="-0.009238895"
+       width="1.0184778"
+       y="-0.017114902"
+       height="1.0342298">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.28481399"
+         id="feGaussianBlur10274" />
+    </filter>
+    <clipPath
+       clipPathUnits="userSpaceOnUse"
+       id="clipPath10338">
+      <rect
+         style="opacity:1;fill:#1d635e;fill-opacity:0.06880733;stroke:#ffffff;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:0.96788988"
+         id="rect10340"
+         width="46.299076"
+         height="115.36229"
+         x="270.74799"
+         y="29.903158" />
+    </clipPath>
+    <clipPath
+       clipPathUnits="userSpaceOnUse"
+       id="clipPath10376">
+      <rect
+         style="opacity:1;fill:#1d635e;fill-opacity:0.06880733;stroke:#ffffff;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:0.96788988"
+         id="rect10378"
+         width="25.940634"
+         height="17.601774"
+         x="3.1655979"
+         y="71.54734"
+         transform="scale(-1,1)" />
+    </clipPath>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter8042-1"
+       x="-0.42381817"
+       width="1.8476363"
+       y="-0.71723074"
+       height="2.4344616">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.25697656"
+         id="feGaussianBlur8044-9" />
+    </filter>
+    <clipPath
+       clipPathUnits="userSpaceOnUse"
+       id="clipPath10679">
+      <rect
+         style="opacity:1;fill:#808080;fill-opacity:0.06880733;stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:0.96788988"
+         id="rect10681"
+         width="25.406197"
+         height="17.531759"
+         x="3.4345946"
+         y="71.547333"
+         transform="scale(-1,1)" />
+    </clipPath>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter12229"
+       x="-0.12156857"
+       width="1.2431371"
+       y="-0.12156857"
+       height="1.2431371">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.43953438"
+         id="feGaussianBlur12231" />
+    </filter>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter12233"
+       x="-0.052935258"
+       width="1.1058705"
+       y="-0.10762672"
+       height="1.2152534">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.43953438"
+         id="feGaussianBlur12235" />
+    </filter>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter12541"
+       x="-0.0144"
+       width="1.0288"
+       y="-0.0144"
+       height="1.0288">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.05206358"
+         id="feGaussianBlur12543" />
+    </filter>
+    <filter
+       inkscape:collect="always"
+       style="color-interpolation-filters:sRGB"
+       id="filter12787"
+       x="-0.0082712928"
+       width="1.0165426"
+       y="-0.055590123"
+       height="1.1111802">
+      <feGaussianBlur
+         inkscape:collect="always"
+         stdDeviation="0.67542246"
+         id="feGaussianBlur12789" />
+    </filter>
+  </defs>
+  <sodipodi:namedview
+     id="base"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="1.0"
+     inkscape:pageopacity="0.0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="1"
+     inkscape:cx="433.45381"
+     inkscape:cy="188.63838"
+     inkscape:document-units="mm"
+     inkscape:current-layer="layer1"
+     showgrid="false"
+     inkscape:window-width="1920"
+     inkscape:window-height="1023"
+     inkscape:window-x="1920"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1" />
+  <metadata
+     id="metadata5">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+        <dc:title>LSTM Cell</dc:title>
+        <cc:license
+           rdf:resource="http://creativecommons.org/licenses/by/4.0/" />
+        <dc:creator>
+          <cc:Agent>
+            <dc:title>Guillaume Chevalier</dc:title>
+          </cc:Agent>
+        </dc:creator>
+        <dc:date>13 May 2017</dc:date>
+      </cc:Work>
+      <cc:License
+         rdf:about="http://creativecommons.org/licenses/by/4.0/">
+        <cc:permits
+           rdf:resource="http://creativecommons.org/ns#Reproduction" />
+        <cc:permits
+           rdf:resource="http://creativecommons.org/ns#Distribution" />
+        <cc:requires
+           rdf:resource="http://creativecommons.org/ns#Notice" />
+        <cc:requires
+           rdf:resource="http://creativecommons.org/ns#Attribution" />
+        <cc:permits
+           rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
+      </cc:License>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(0,-167)">
+    <g
+       id="g1436"
+       transform="matrix(0.89305455,0,0,1.0058054,1.0658078,-1.5875497)">
+      <g
+         id="g1431">
+        <rect
+           ry="2.9579358"
+           y="261.14444"
+           x="6.6429443"
+           height="29.160107"
+           width="195.98071"
+           id="rect12545"
+           style="fill:#f2f2f2;fill-opacity:1;stroke-width:0.23194653;filter:url(#filter12787)"
+           transform="matrix(0.95880034,0,0,0.95880034,4.3481892,9.0950066)" />
+      </g>
+      <rect
+         ry="2.8360698"
+         y="259.48038"
+         x="10.717446"
+         height="27.958721"
+         width="187.90637"
+         id="rect5514-0"
+         style="fill:#f2f2f2;fill-opacity:1;stroke-width:0.22239041" />
+    </g>
+    <rect
+       style="opacity:1;fill:#ffffff;fill-opacity:0.98165134;stroke:#ffffff;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:0.96788988;filter:url(#filter8042-1)"
+       id="rect6284-0"
+       width="1.4552083"
+       height="0.85989583"
+       x="53.280468"
+       y="68.763802"
+       transform="matrix(0,1.1584917,-1.1584917,0,127.6196,134.4837)" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       x="186.17693"
+       y="176.74184"
+       id="text6336"><tspan
+         sodipodi:role="line"
+         id="tspan6334"
+         x="186.17693"
+         y="187.58969"
+         style="stroke-width:0.30651757" /></text>
+    <g
+       id="g6473"
+       transform="matrix(1.1584917,0,0,1.1584917,14.071849,136.59698)" />
+    <rect
+       y="46.896858"
+       x="59.559067"
+       height="45.18705"
+       width="99.861305"
+       id="rect6623"
+       style="fill:#4ec2a7;fill-opacity:1;stroke-width:0.20610677;filter:url(#filter6777)"
+       ry="4.5836725"
+       transform="matrix(1.0031719,0,0,1.2674399,-15.227239,128.62865)" />
+    <rect
+       style="fill:#4ec2a7;fill-opacity:1;stroke-width:0.23240399"
+       id="rect5514"
+       width="100.17805"
+       height="57.271866"
+       x="44.520744"
+       y="188.06775"
+       ry="5.8095293" />
+    <g
+       id="g11423"
+       transform="matrix(1.1584917,0,0,1.1584917,-81.653426,135.33973)">
+      <path
+         sodipodi:nodetypes="cc"
+         inkscape:connector-curvature="0"
+         id="path11415"
+         d="m 101.69547,53.23758 c 1.67044,0.0668 16.88275,0 16.88275,0"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <g
+         id="g11421">
+        <path
+           sodipodi:nodetypes="ccc"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 117.58225,53.22346 -1.14593,-1.14593 z"
+           id="path11417"
+           inkscape:connector-curvature="0" />
+        <path
+           sodipodi:nodetypes="ccc"
+           inkscape:connector-curvature="0"
+           id="path11419"
+           d="m 117.58225,53.22346 -1.14593,1.14594 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      </g>
+    </g>
+    <g
+       id="g7124"
+       style="filter:url(#filter10272)"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <g
+         id="g7090"
+         transform="translate(-6.7154443,17.341029)">
+        <rect
+           y="46.023808"
+           x="150.56232"
+           height="8.0319939"
+           width="12.904612"
+           id="rect7084"
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.25856197"
+           ry="4.0159969" />
+        <text
+           id="text7088"
+           y="51.551083"
+           x="151.8725"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332"
+             y="51.551083"
+             x="151.8725"
+             id="tspan7086"
+             sodipodi:role="line">tanh</tspan></text>
+      </g>
+      <g
+         id="g7098"
+         transform="translate(-0.57877604,9.4479778)">
+        <circle
+           r="3.5907738"
+           cy="43.566963"
+           cx="86.934525"
+           id="circle7092"
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.26458332" />
+        <text
+           id="text7096"
+           y="44.898705"
+           x="85.632172"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332"
+             y="44.898705"
+             x="85.632172"
+             id="tspan7094"
+             sodipodi:role="line">x</tspan></text>
+      </g>
+      <g
+         id="g7106"
+         transform="translate(0,9.4479778)">
+        <circle
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.26458332"
+           id="circle7100"
+           cx="121.17175"
+           cy="43.566963"
+           r="3.5907738" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           x="119.8694"
+           y="44.898705"
+           id="text7104"><tspan
+             sodipodi:role="line"
+             id="tspan7102"
+             x="119.8694"
+             y="44.898705"
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332">x</tspan></text>
+      </g>
+      <g
+         id="g7114"
+         transform="translate(63.364357,42.205518)">
+        <circle
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.26458332"
+           id="circle7108"
+           cx="86.934525"
+           cy="43.566963"
+           r="3.5907738" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           x="85.632172"
+           y="44.898705"
+           id="text7112"><tspan
+             sodipodi:role="line"
+             id="tspan7110"
+             x="85.632172"
+             y="44.898705"
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332">x</tspan></text>
+      </g>
+      <g
+         transform="translate(34.253762,19.725045)"
+         id="g7122">
+        <circle
+           r="3.5907738"
+           cy="43.566963"
+           cx="86.934525"
+           id="circle7116"
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.26458332" />
+        <text
+           id="text7120"
+           y="44.898705"
+           x="85.632172"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332"
+             y="44.898705"
+             x="85.632172"
+             id="tspan7118"
+             sodipodi:role="line">x</tspan></text>
+      </g>
+    </g>
+    <g
+       id="g7039"
+       style="filter:url(#filter10264)"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <g
+         id="g7013">
+        <rect
+           y="69.578537"
+           x="97.360367"
+           height="8.0023499"
+           width="12.879682"
+           id="rect7007"
+           style="fill:#ff9955;stroke-width:0.22517994" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           x="100.35973"
+           y="76.502632"
+           id="text7011"><tspan
+             sodipodi:role="line"
+             id="tspan7009"
+             x="100.35973"
+             y="76.502632"
+             style="fill:#4d4d4d;stroke-width:0.26458332">σ</tspan></text>
+      </g>
+      <g
+         id="g7021"
+         transform="translate(-4.1691348,11.88013)">
+        <rect
+           y="69.578537"
+           x="132.10547"
+           height="8.0023499"
+           width="12.879682"
+           id="rect7015"
+           style="fill:#ff9955;stroke-width:0.22517994" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           x="135.10483"
+           y="76.502632"
+           id="text7019"><tspan
+             sodipodi:role="line"
+             id="tspan7017"
+             x="135.10483"
+             y="76.502632"
+             style="fill:#4d4d4d;stroke-width:0.26458332">σ</tspan></text>
+      </g>
+      <g
+         id="g7029">
+        <rect
+           style="fill:#ff9955;stroke-width:0.22517994"
+           id="rect7023"
+           width="12.879682"
+           height="8.0023499"
+           x="79.987823"
+           y="69.578537" />
+        <text
+           id="text7027"
+           y="76.502632"
+           x="82.98719"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="fill:#4d4d4d;stroke-width:0.26458332"
+             y="76.502632"
+             x="82.98719"
+             id="tspan7025"
+             sodipodi:role="line">σ</tspan></text>
+      </g>
+      <g
+         id="g7037">
+        <rect
+           style="fill:#ff9955;stroke-width:0.22517994"
+           id="rect7031"
+           width="12.879682"
+           height="8.0023499"
+           x="114.73292"
+           y="69.578537" />
+        <text
+           id="text7035"
+           y="75.166283"
+           x="116.39593"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332"
+             y="75.166283"
+             x="116.39593"
+             id="tspan7033"
+             sodipodi:role="line">tanh</tspan></text>
+      </g>
+    </g>
+    <g
+       transform="matrix(1.1584917,0,0,1.1584917,-81.704958,111.03958)"
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       id="g6126">
+      <path
+         sodipodi:nodetypes="cc"
+         inkscape:connector-curvature="0"
+         id="path6118"
+         d="m 121.20363,100.28111 c 0.0668,-1.670436 0,-22.70473 0,-22.70473"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <g
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         id="g6124"
+         transform="translate(35.057279)">
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 86.14629,77.57638 1.145937,1.145937 z"
+           id="path6120"
+           inkscape:connector-curvature="0" />
+        <path
+           inkscape:connector-curvature="0"
+           id="path6122"
+           d="m 86.14629,77.57638 -1.145937,1.145937 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      </g>
+    </g>
+    <g
+       transform="matrix(-1.1584917,0,0,-1.1584917,273.16562,320.39542)"
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       id="g6116">
+      <path
+         sodipodi:nodetypes="cc"
+         inkscape:connector-curvature="0"
+         id="path6108"
+         d="m 121.20363,89.600335 c 0.0668,-1.670433 0,-12.023955 0,-12.023955"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <g
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         id="g6114"
+         transform="translate(35.057279)">
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 86.14629,77.57638 1.145937,1.145937 z"
+           id="path6110"
+           inkscape:connector-curvature="0" />
+        <path
+           inkscape:connector-curvature="0"
+           id="path6112"
+           d="m 86.14629,77.57638 -1.145937,1.145937 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      </g>
+    </g>
+    <g
+       id="g6053"
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       transform="matrix(-1.1584917,0,0,-1.1584917,273.16562,298.6098)">
+      <path
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 123.40192,87.510461 c -2.4589,-0.01002 -2.19829,-10.082909 -2.19829,-10.082909"
+         id="path6045"
+         inkscape:connector-curvature="0"
+         sodipodi:nodetypes="cc" />
+      <g
+         transform="translate(35.057279)"
+         id="g6051"
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1">
+        <path
+           inkscape:connector-curvature="0"
+           id="path6047"
+           d="m 86.14629,77.57638 1.145937,1.145937 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 86.14629,77.57638 -1.145937,1.145937 z"
+           id="path6049"
+           inkscape:connector-curvature="0" />
+      </g>
+    </g>
+    <g
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,111.02632)"
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       id="g6043">
+      <path
+         sodipodi:nodetypes="cc"
+         inkscape:connector-curvature="0"
+         id="path6035"
+         d="m 121.20363,87.448815 c 0.0668,-1.670433 0,-9.872435 0,-9.872435"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <g
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         id="g6041"
+         transform="translate(35.057279)">
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 86.14629,77.57638 1.145937,1.145937 z"
+           id="path6037"
+           inkscape:connector-curvature="0" />
+        <path
+           inkscape:connector-curvature="0"
+           id="path6039"
+           d="m 86.14629,77.57638 -1.145937,1.145937 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      </g>
+    </g>
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:0.57924587;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 36.170736,234.63556 H 116.17405"
+       id="path5722"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <g
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       id="g5802"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <path
+         sodipodi:nodetypes="ccc"
+         inkscape:connector-curvature="0"
+         id="path5743"
+         d="m 82.538164,85.711397 c 0,0 3.54131,0.133636 3.608128,-1.536801 0.0668,-1.670433 0,-6.598216 0,-6.598216"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <g
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         id="g5766">
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 86.14629,77.57638 1.145937,1.145937 z"
+           id="path5745"
+           inkscape:connector-curvature="0" />
+        <path
+           inkscape:connector-curvature="0"
+           id="path5762"
+           d="m 86.14629,77.57638 -1.145937,1.145937 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      </g>
+    </g>
+    <g
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       id="g5796"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <path
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 100.39753,85.711397 c 0,0 3.54132,0.133636 3.60814,-1.536801 0.0668,-1.670433 0,-6.598216 0,-6.598216"
+         id="path5741"
+         inkscape:connector-curvature="0"
+         sodipodi:nodetypes="ccc" />
+      <g
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         transform="translate(17.859371)"
+         id="g5772">
+        <path
+           inkscape:connector-curvature="0"
+           id="path5768"
+           d="m 86.14629,77.57638 1.145937,1.145937 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 86.14629,77.57638 -1.145937,1.145937 z"
+           id="path5770"
+           inkscape:connector-curvature="0" />
+      </g>
+    </g>
+    <g
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       id="g5790"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <path
+         sodipodi:nodetypes="ccc"
+         inkscape:connector-curvature="0"
+         id="path5724"
+         d="m 117.59549,85.711397 c 0,0 3.54132,0.133636 3.60814,-1.536801 0.0668,-1.670433 0,-6.598216 0,-6.598216"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <g
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         id="g5778"
+         transform="translate(35.057279)">
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 86.14629,77.57638 1.145937,1.145937 z"
+           id="path5774"
+           inkscape:connector-curvature="0" />
+        <path
+           inkscape:connector-curvature="0"
+           id="path5776"
+           d="m 86.14629,77.57638 -1.145937,1.145937 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      </g>
+    </g>
+    <g
+       transform="matrix(1.1584917,0,0,1.1584917,-7.5322135,166.16175)"
+       id="g5924-5"
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1">
+      <path
+         sodipodi:nodetypes="cc"
+         inkscape:connector-curvature="0"
+         id="path5724-1-1-7"
+         d="m 103.74646,58.938513 c 0,0 0.03,0.07902 13.88106,0.171766"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <path
+         sodipodi:nodetypes="ccc"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 117.62752,59.110339 -1.14593,-1.14593 z"
+         id="path5774-4-1-8"
+         inkscape:connector-curvature="0" />
+      <path
+         sodipodi:nodetypes="ccc"
+         inkscape:connector-curvature="0"
+         id="path5776-0-8-5"
+         d="m 117.62752,59.110339 -1.14593,1.14594 z"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+    </g>
+    <g
+       id="g6186"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <path
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 89.559244,53.23758 c 1.670436,0.0668 29.018976,0 29.018976,0"
+         id="path6118-1"
+         inkscape:connector-curvature="0"
+         sodipodi:nodetypes="cc" />
+      <g
+         id="g6180">
+        <path
+           inkscape:connector-curvature="0"
+           id="path5774-4-1-6"
+           d="m 117.58225,53.22346 -1.14593,-1.14593 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           sodipodi:nodetypes="ccc" />
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 117.58225,53.22346 -1.14593,1.14594 z"
+           id="path5776-0-8-0"
+           inkscape:connector-curvature="0"
+           sodipodi:nodetypes="ccc" />
+      </g>
+    </g>
+    <g
+       id="g5924"
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,134.11367)">
+      <path
+         sodipodi:nodetypes="cc"
+         inkscape:connector-curvature="0"
+         id="path5724-1-1"
+         d="m 103.62258,73.599644 c -0.17343,-9.264879 0.15387,-9.325178 14.00497,-9.232434"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <path
+         sodipodi:nodetypes="ccc"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 117.57735,64.36727 -1.14593,-1.14593 z"
+         id="path5774-4-1"
+         inkscape:connector-curvature="0" />
+      <path
+         sodipodi:nodetypes="ccc"
+         inkscape:connector-curvature="0"
+         id="path5776-0-8"
+         d="m 117.57735,64.36727 -1.14593,1.14594 z"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+    </g>
+    <g
+       id="g5977"
+       style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,122.92152)">
+      <path
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 121.20363,86.324336 c 0.0668,-1.670433 0,-8.747956 0,-8.747956"
+         id="path5969"
+         inkscape:connector-curvature="0"
+         sodipodi:nodetypes="cc" />
+      <g
+         transform="translate(35.057279)"
+         id="g5975"
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1">
+        <path
+           inkscape:connector-curvature="0"
+           id="path5971"
+           d="m 86.14629,77.57638 1.145937,1.145937 z"
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+        <path
+           style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           d="m 86.14629,77.57638 -1.145937,1.145937 z"
+           id="path5973"
+           inkscape:connector-curvature="0" />
+      </g>
+    </g>
+    <g
+       id="g6839"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <g
+         id="g5833">
+        <rect
+           style="fill:#ff9955;stroke-width:0.22517994"
+           id="rect5541"
+           width="12.879682"
+           height="8.0023499"
+           x="97.360367"
+           y="69.578537" />
+        <text
+           id="text5577"
+           y="76.502632"
+           x="100.35973"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="fill:#4d4d4d;stroke-width:0.26458332"
+             y="76.502632"
+             x="100.35973"
+             id="tspan5575"
+             sodipodi:role="line">σ</tspan></text>
+      </g>
+      <g
+         transform="translate(-4.1691348,11.88013)"
+         id="g5843">
+        <rect
+           style="fill:#ff9955;stroke-width:0.22517994"
+           id="rect5545"
+           width="12.879682"
+           height="8.0023499"
+           x="132.10547"
+           y="69.578537" />
+        <text
+           id="text5585"
+           y="76.502632"
+           x="135.10483"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="fill:#4d4d4d;stroke-width:0.26458332"
+             y="76.502632"
+             x="135.10483"
+             id="tspan5583"
+             sodipodi:role="line">σ</tspan></text>
+      </g>
+      <g
+         id="g5828">
+        <rect
+           y="69.578537"
+           x="79.987823"
+           height="8.0023499"
+           width="12.879682"
+           id="rect5535"
+           style="fill:#ff9955;stroke-width:0.22517994" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           x="82.98719"
+           y="76.502632"
+           id="text5555"><tspan
+             sodipodi:role="line"
+             id="tspan5553"
+             x="82.98719"
+             y="76.502632"
+             style="fill:#4d4d4d;stroke-width:0.26458332">σ</tspan></text>
+      </g>
+      <g
+         id="g5838">
+        <rect
+           y="69.578537"
+           x="114.73292"
+           height="8.0023499"
+           width="12.879682"
+           id="rect5543"
+           style="fill:#ff9955;stroke-width:0.22517994" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           x="116.39593"
+           y="75.166283"
+           id="text5581"><tspan
+             sodipodi:role="line"
+             id="tspan5579"
+             x="116.39593"
+             y="75.166283"
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332">tanh</tspan></text>
+      </g>
+    </g>
+    <g
+       id="g6332"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <g
+         transform="translate(53.317378,26.652963)"
+         id="g6106"
+         style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1">
+        <g
+           id="g6215">
+          <path
+             style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+             d="m 100.00468,59.055443 c 0,0 3.77178,-0.03791 17.62284,0.05484"
+             id="path6100"
+             inkscape:connector-curvature="0"
+             sodipodi:nodetypes="cc" />
+          <path
+             inkscape:connector-curvature="0"
+             id="path6102"
+             d="m 117.62752,59.110339 -1.14593,-1.14593 z"
+             style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+             sodipodi:nodetypes="ccc" />
+          <path
+             style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+             d="m 117.80425,58.931539 -1.31336,1.323706 z"
+             id="path6104"
+             inkscape:connector-curvature="0"
+             sodipodi:nodetypes="ccc" />
+        </g>
+        <g
+           transform="translate(0,-32.279181)"
+           id="g6223">
+          <path
+             sodipodi:nodetypes="cc"
+             inkscape:connector-curvature="0"
+             id="path6217"
+             d="m 71.239797,59.055443 c 0,0 32.536663,-0.03791 46.387723,0.05484"
+             style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+          <path
+             sodipodi:nodetypes="ccc"
+             style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+             d="m 117.62752,59.110339 -1.14593,-1.14593 z"
+             id="path6219"
+             inkscape:connector-curvature="0" />
+          <path
+             sodipodi:nodetypes="ccc"
+             inkscape:connector-curvature="0"
+             id="path6221"
+             d="m 117.80425,58.931539 -1.31336,1.323706 z"
+             style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+        </g>
+      </g>
+      <g
+         id="g6313">
+        <rect
+           style="opacity:1;fill:#ffffff;fill-opacity:0.98165134;stroke:#ffffff;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:0.96788988;filter:url(#filter8042)"
+           id="rect6284"
+           width="1.4552083"
+           height="0.85989583"
+           x="164.43854"
+           y="53.021091" />
+        <g
+           id="g6246"
+           style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+           transform="translate(43.92084)">
+          <g
+             id="g6252"
+             transform="translate(0,-39.290625)">
+            <path
+               sodipodi:nodetypes="ccc"
+               inkscape:connector-curvature="0"
+               id="path6238"
+               d="m 117.59937,125.00205 c 0,0 3.57051,0.0262 3.60426,-3.88497 0.0668,-1.67044 0,-38.778197 0,-38.778197"
+               style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+            <g
+               style="stroke:#4d4d4d;stroke-width:0.5;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+               id="g6244"
+               transform="translate(35.057279)">
+              <path
+                 style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+                 d="m 85.972657,82.163183 1.31957,1.321637 z"
+                 id="path6240"
+                 inkscape:connector-curvature="0"
+                 sodipodi:nodetypes="ccc" />
+              <path
+                 inkscape:connector-curvature="0"
+                 id="path6242"
+                 d="M 86.14629,82.338883 85.000353,83.48482 Z"
+                 style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+                 sodipodi:nodetypes="ccc" />
+            </g>
+          </g>
+        </g>
+      </g>
+    </g>
+    <text
+       id="text5581-6-3"
+       y="181.18173"
+       x="148.30687"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       xml:space="preserve"><tspan
+         style="font-size:5.31297159px;fill:#4d4d4d;stroke-width:0.30651757"
+         y="181.18173"
+         x="148.30687"
+         id="tspan5579-7-2"
+         sodipodi:role="line">h<tspan
+   style="font-size:2.45214057px;stroke-width:0.30651757"
+   id="tspan10610">t</tspan></tspan></text>
+    <text
+       id="text5581-6-3-5"
+       y="198.77341"
+       x="157.97829"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       xml:space="preserve"><tspan
+         style="font-size:5.31297159px;fill:#4d4d4d;stroke-width:0.30651757"
+         y="198.77341"
+         x="157.97829"
+         id="tspan5579-7-2-1"
+         sodipodi:role="line">c<tspan
+   style="font-size:2.45214057px;stroke-width:0.30651757"
+   id="tspan10606">t</tspan></tspan></text>
+    <text
+       id="text5581-6-3-3"
+       y="235.88847"
+       x="157.97829"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       xml:space="preserve"><tspan
+         style="font-size:5.31297159px;fill:#4d4d4d;stroke-width:0.30651757"
+         y="235.88847"
+         x="157.97829"
+         id="tspan5579-7-2-2"
+         sodipodi:role="line">h<tspan
+   style="font-size:2.45214057px;stroke-width:0.30651757"
+   id="tspan10608">t</tspan></tspan></text>
+    <g
+       id="g1082"
+       transform="translate(-45.508335)">
+      <path
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="M 57.617755,94.747095 C 57.397553,87.961141 61.279538,84.580018 69.003485,84.62599"
+         id="path5724-1-1-0"
+         inkscape:connector-curvature="0"
+         sodipodi:nodetypes="cc"
+         transform="matrix(1.1584917,0,0,1.1584917,14.071849,136.59698)" />
+      <text
+         xml:space="preserve"
+         style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+         x="79.295021"
+         y="251.12712"
+         id="text5581-6-3-0"><tspan
+           sodipodi:role="line"
+           id="tspan5579-7-2-4"
+           x="79.295021"
+           y="251.12712"
+           style="font-size:5.31297159px;fill:#4d4d4d;stroke-width:0.30651757">x<tspan
+   id="tspan10612"
+   style="font-size:2.45214057px;stroke-width:0.30651757">t</tspan></tspan></text>
+    </g>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       x="28.036568"
+       y="198.30452"
+       id="text10321-2"><tspan
+         sodipodi:role="line"
+         id="tspan10319-8"
+         x="28.036568"
+         y="198.30452"
+         style="font-size:5.31297159px;fill:#4d4d4d;stroke-width:0.30651757">c<tspan
+   style="font-size:2.45214057px;stroke-width:0.30651757"
+   id="tspan10323-6">t-1</tspan></tspan></text>
+    <g
+       id="g11344"
+       transform="matrix(1.1584917,0,0,1.1584917,-41.244631,135.33973)">
+      <g
+         id="g11310"
+         transform="translate(-6.7154443,17.341029)">
+        <rect
+           y="46.023808"
+           x="150.56232"
+           height="8.0319939"
+           width="12.904612"
+           id="rect11304"
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.25856197"
+           ry="4.0159969" />
+        <text
+           id="text11308"
+           y="51.551083"
+           x="156.86485"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="font-size:4.58611107px;text-align:center;text-anchor:middle;fill:#4d4d4d;stroke-width:0.26458332"
+             y="51.551083"
+             x="156.86485"
+             id="tspan11306"
+             sodipodi:role="line">tanh</tspan></text>
+      </g>
+      <g
+         id="g11318"
+         transform="translate(-0.57877604,9.4479778)">
+        <circle
+           r="3.5907738"
+           cy="43.566963"
+           cx="86.934525"
+           id="circle11312"
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.26458332" />
+        <text
+           id="text11316"
+           y="44.898705"
+           x="85.632172"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332"
+             y="44.898705"
+             x="85.632172"
+             id="tspan11314"
+             sodipodi:role="line">x</tspan></text>
+      </g>
+      <g
+         id="g11326"
+         transform="translate(0,9.4479778)">
+        <circle
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.26458332"
+           id="circle11320"
+           cx="121.17175"
+           cy="43.566963"
+           r="3.5907738" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           x="119.3659"
+           y="45.019833"
+           id="text11324"><tspan
+             sodipodi:role="line"
+             id="tspan11322"
+             x="119.3659"
+             y="45.019833"
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332">+</tspan></text>
+      </g>
+      <g
+         id="g11334"
+         transform="translate(63.364357,42.205518)">
+        <circle
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.26458332"
+           id="circle11328"
+           cx="86.934525"
+           cy="43.566963"
+           r="3.5907738" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           x="85.632172"
+           y="44.898705"
+           id="text11332"><tspan
+             sodipodi:role="line"
+             id="tspan11330"
+             x="85.632172"
+             y="44.898705"
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332">x</tspan></text>
+      </g>
+      <g
+         transform="translate(34.253762,19.725045)"
+         id="g11342">
+        <circle
+           r="3.5907738"
+           cy="43.566963"
+           cx="86.934525"
+           id="circle11336"
+           style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.26458332" />
+        <text
+           id="text11340"
+           y="44.898705"
+           x="85.632172"
+           style="font-style:normal;font-weight:normal;font-size:10.58333302px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+           xml:space="preserve"><tspan
+             style="font-size:4.58611107px;fill:#4d4d4d;stroke-width:0.26458332"
+             y="44.898705"
+             x="85.632172"
+             id="tspan11338"
+             sodipodi:role="line">x</tspan></text>
+      </g>
+    </g>
+    <text
+       id="text5581-6-3-0-0"
+       y="277.1235"
+       x="17.419685"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       xml:space="preserve"><tspan
+         style="font-size:12.25794315px;fill:#4d4d4d;stroke-width:0.30651757"
+         y="277.1235"
+         x="17.419685"
+         sodipodi:role="line"
+         id="tspan11446">Legend:</tspan></text>
+    <rect
+       y="275.85126"
+       x="80.56076"
+       height="9.8013067"
+       width="19.927786"
+       id="rect5535-2"
+       style="fill:#ff9955;stroke-width:0.30998442;filter:url(#filter12233)"
+       transform="matrix(0.95880034,0,0,0.95880034,4.3481892,9.0950066)" />
+    <text
+       id="text5581-6-3-0-0-4"
+       y="268.51303"
+       x="84.162361"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       xml:space="preserve"><tspan
+         style="font-size:5.31297159px;fill:#4d4d4d;stroke-width:0.30651757"
+         y="268.51303"
+         x="84.162361"
+         sodipodi:role="line"
+         id="tspan11446-8">Layer</tspan></text>
+    <circle
+       style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.31968862;filter:url(#filter12229)"
+       id="circle11328-5"
+       cx="136.74632"
+       cy="280.75192"
+       r="4.3386316"
+       transform="matrix(0.95880034,0,0,0.95880034,-4.647644,9.0950066)" />
+    <text
+       id="text5581-6-3-0-0-9"
+       y="268.59476"
+       x="109.18328"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       xml:space="preserve"><tspan
+         style="font-size:5.31297159px;letter-spacing:0px;word-spacing:0.08371533px;fill:#4d4d4d;stroke-width:0.30651757"
+         y="268.59476"
+         x="109.18328"
+         sodipodi:role="line"
+         id="tspan11484">Pointwize op</tspan></text>
+    <g
+       id="g11554"
+       transform="matrix(0.95880034,0,0,0.95880034,-9.5377823,-2.9012372)">
+      <path
+         sodipodi:nodetypes="cc"
+         inkscape:connector-curvature="0"
+         id="path5722-6"
+         d="m 171.29205,295.17188 h 8.68051"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.60413605;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <path
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.60413605;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 171.29205,295.17188 c 0,0 4.27888,0.16147 4.35962,-1.85688 0.0807,-2.01833 0,-3.13152 0,-3.13152"
+         id="path5724-2"
+         inkscape:connector-curvature="0"
+         sodipodi:nodetypes="ccc" />
+      <path
+         sodipodi:nodetypes="ccc"
+         inkscape:connector-curvature="0"
+         id="path5774-1"
+         d="m 175.4398,289.97057 1.58408,1.58718 z"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.60413605;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+      <path
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.60413605;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 175.65167,290.18348 -1.3846,1.38461 z"
+         id="path5776-8"
+         inkscape:connector-curvature="0" />
+      <path
+         inkscape:connector-curvature="0"
+         id="path5774-4-1-8-1"
+         d="m 180.18547,295.38582 -1.5975,-1.59853 z"
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.60413605;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         sodipodi:nodetypes="ccc" />
+      <path
+         style="fill:none;stroke:#4d4d4d;stroke-width:0.60413605;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+         d="m 179.97256,295.17188 -1.38459,1.38461 z"
+         id="path5776-0-8-5-3"
+         inkscape:connector-curvature="0"
+         sodipodi:nodetypes="ccc" />
+    </g>
+    <text
+       id="text5581-6-3-0-0-0"
+       y="268.54803"
+       x="152.44072"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       xml:space="preserve"><tspan
+         style="font-size:5.31297159px;fill:#4d4d4d;stroke-width:0.30651757"
+         y="268.54803"
+         x="152.44072"
+         sodipodi:role="line"
+         id="tspan11446-82">Copy</tspan></text>
+    <rect
+       style="fill:#ff9955;stroke-width:0.29721317"
+       id="rect11641"
+       width="19.106768"
+       height="9.3974962"
+       x="81.589874"
+       y="273.58127" />
+    <circle
+       r="4.3386316"
+       cy="280.75192"
+       cx="136.74632"
+       id="circle12239"
+       style="opacity:0.34800002;fill:#000000;fill-opacity:0.94036698;stroke-width:0.31968862;filter:url(#filter12541)"
+       transform="matrix(0.96462141,0,0,0.96462141,-5.443653,7.4607325)" />
+    <circle
+       r="4.1598816"
+       cy="278.28003"
+       cx="126.46479"
+       id="circle11643"
+       style="fill:#e5ff5b;fill-opacity:1;stroke-width:0.30651757" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:12.26070404px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#4d4d4d;fill-opacity:1;stroke:none;stroke-width:0.30651757"
+       x="28.067818"
+       y="235.88847"
+       id="text1045"><tspan
+         sodipodi:role="line"
+         id="tspan1043"
+         x="28.067818"
+         y="235.88847"
+         style="font-size:5.31297159px;fill:#4d4d4d;stroke-width:0.30651757">h<tspan
+   id="tspan1041"
+   style="font-size:2.45214057px;stroke-width:0.30651757">t-1</tspan></tspan></text>
+  </g>
+</svg>
diff --git a/5-NLP/16-RNN/images/multi-layer-lstm.jpg b/5-NLP/16-RNN/images/multi-layer-lstm.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..96e8f18548e37d4f415ff46db0b662d6a8a2f155
Binary files /dev/null and b/5-NLP/16-RNN/images/multi-layer-lstm.jpg differ
diff --git a/5-NLP/16-RNN/images/rnn-anatomy.png b/5-NLP/16-RNN/images/rnn-anatomy.png
new file mode 100644
index 0000000000000000000000000000000000000000..937ccdd65423b9d4202a44d0e995a539aa8f6c74
Binary files /dev/null and b/5-NLP/16-RNN/images/rnn-anatomy.png differ
diff --git a/5-NLP/16-RNN/images/rnn.png b/5-NLP/16-RNN/images/rnn.png
new file mode 100644
index 0000000000000000000000000000000000000000..160b03036d0872cc48f35733708862a2226ffba4
Binary files /dev/null and b/5-NLP/16-RNN/images/rnn.png differ
diff --git a/5-NLP/16-RNN/torchnlp.py b/5-NLP/16-RNN/torchnlp.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6ca5e0c19c08862edc19d7720ae9d66d364b26a
--- /dev/null
+++ b/5-NLP/16-RNN/torchnlp.py
@@ -0,0 +1,104 @@
+import builtins
+import torch
+import torchtext
+import collections
+import os
+
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+vocab = None
+tokenizer = torchtext.data.utils.get_tokenizer('basic_english')
+
+def load_dataset(ngrams=1,min_freq=1):
+    global vocab, tokenizer
+    print("Loading dataset...")
+    train_dataset, test_dataset = torchtext.datasets.AG_NEWS(root='./data')
+    train_dataset = list(train_dataset)
+    test_dataset = list(test_dataset)
+    classes = ['World', 'Sports', 'Business', 'Sci/Tech']
+    print('Building vocab...')
+    counter = collections.Counter()
+    for (label, line) in train_dataset:
+        counter.update(torchtext.data.utils.ngrams_iterator(tokenizer(line),ngrams=ngrams))
+    vocab = torchtext.vocab.Vocab(counter, min_freq=min_freq)
+    return train_dataset,test_dataset,classes,vocab
+
+def encode(x,voc=None,unk=0,tokenizer=tokenizer):
+    v = vocab if voc is None else voc
+    return [v.stoi.get(s,unk) for s in tokenizer(x)]
+
+def train_epoch(net,dataloader,lr=0.01,optimizer=None,loss_fn = torch.nn.CrossEntropyLoss(),epoch_size=None, report_freq=200):
+    optimizer = optimizer or torch.optim.Adam(net.parameters(),lr=lr)
+    loss_fn = loss_fn.to(device)
+    net.train()
+    total_loss,acc,count,i = 0,0,0,0
+    for labels,features in dataloader:
+        optimizer.zero_grad()
+        features, labels = features.to(device), labels.to(device)
+        out = net(features)
+        loss = loss_fn(out,labels) #cross_entropy(out,labels)
+        loss.backward()
+        optimizer.step()
+        total_loss+=loss
+        _,predicted = torch.max(out,1)
+        acc+=(predicted==labels).sum()
+        count+=len(labels)
+        i+=1
+        if i%report_freq==0:
+            print(f"{count}: acc={acc.item()/count}")
+        if epoch_size and count>epoch_size:
+            break
+    return total_loss.item()/count, acc.item()/count
+
+def padify(b,voc=None,tokenizer=tokenizer):
+    # b is the list of tuples of length batch_size
+    #   - first element of a tuple = label, 
+    #   - second = feature (text sequence)
+    # build vectorized sequence
+    v = [encode(x[1],voc=voc,tokenizer=tokenizer) for x in b]
+    # compute max length of a sequence in this minibatch
+    l = max(map(len,v))
+    return ( # tuple of two tensors - labels and features
+        torch.LongTensor([t[0]-1 for t in b]),
+        torch.stack([torch.nn.functional.pad(torch.tensor(t),(0,l-len(t)),mode='constant',value=0) for t in v])
+    )
+
+def offsetify(b,voc=None):
+    # first, compute data tensor from all sequences
+    x = [torch.tensor(encode(t[1],voc=voc)) for t in b]
+    # now, compute the offsets by accumulating the tensor of sequence lengths
+    o = [0] + [len(t) for t in x]
+    o = torch.tensor(o[:-1]).cumsum(dim=0)
+    return ( 
+        torch.LongTensor([t[0]-1 for t in b]), # labels
+        torch.cat(x), # text 
+        o
+    )
+
+def train_epoch_emb(net,dataloader,lr=0.01,optimizer=None,loss_fn = torch.nn.CrossEntropyLoss(),epoch_size=None, report_freq=200,use_pack_sequence=False):
+    optimizer = optimizer or torch.optim.Adam(net.parameters(),lr=lr)
+    loss_fn = loss_fn.to(device)
+    net.train()
+    total_loss,acc,count,i = 0,0,0,0
+    for labels,text,off in dataloader:
+        optimizer.zero_grad()
+        labels,text = labels.to(device), text.to(device)
+        if use_pack_sequence:
+            off = off.to('cpu')
+        else:
+            off = off.to(device)
+        out = net(text, off)
+        loss = loss_fn(out,labels) #cross_entropy(out,labels)
+        loss.backward()
+        optimizer.step()
+        total_loss+=loss
+        _,predicted = torch.max(out,1)
+        acc+=(predicted==labels).sum()
+        count+=len(labels)
+        i+=1
+        if i%report_freq==0:
+            print(f"{count}: acc={acc.item()/count}")
+        if epoch_size and count>epoch_size:
+            break
+    return total_loss.item()/count, acc.item()/count
+
diff --git a/5-NLP/README.md b/5-NLP/README.md
index fa9f7b3fedcc1e9c0e58e3cdcde551788a03ab87..89b9f55a726f60f3152f950f8691053ed14a9211 100644
--- a/5-NLP/README.md
+++ b/5-NLP/README.md
@@ -52,4 +52,6 @@ if len(physical_devices)>0:
 * [Representing text as tensors](13-TextRep/README.md)
 * [Word Embeddings](14-Emdeddings/README.md)
 * [Language Modeling](15-LanguageModeling/README.md)
-
+* [Recurrent Neural Networks](16-RNN/README.md)
+* [Generative Networks](17-GenerativeNetworks/README.md)
+* [Transformers](18-Transformers/README.md)
diff --git a/README.md b/README.md
index 42d2a503bc715a9ff7302cfda9f509f770c7a0bb..ad51a05f5feea99b9b2689b13cb7299832d1a697 100644
--- a/README.md
+++ b/README.md
@@ -64,9 +64,9 @@ For a gentle introduction to *AI in the Cloud* topic you may consider taking [Ge
 <tr><td>13</td><td>Text Representation. Bow/TF-IDF</td><td><a href="5-NLP/13-TextRep/README.md">Text</a></td><td><a href="5-NLP/13-TextRep/TextRepresentationPyTorch.ipynb">PyTorch</a></td><td><a href="5-NLP/13-TextRep/TextRepresentationTF.ipynb">Tensorflow</td><td></td></tr>
 <tr><td>14</td><td>Semantic word embeddings. Word2Vec and GloVe</td><td><a href="5-NLP/14-Embeddings/README.md">Text</td><td><a href="5-NLP/14-Embeddings/EmbeddingsPyTorch.ipynb">PyTorch</a></td><td><a href="5-NLP/14-Embeddings/EmbeddingsTF.ipynb">Tensorflow</a></td><td></td></tr>
 <tr><td>15</td><td>Language Modeling. Training your own embeddings</td><td><a href="5-NLP/15-LanguageModeling">Text</a></td><td>PyTorch</td><td>Tensorflow</td><td></td></tr>
-<tr><td>16</td><td>Recurrent Neural Networks</td><td>Text</td><td>PyTorch</td><td>Tensorflow</td><td></td></tr>
-<tr><td>17</td><td>Generative Recurrent Networks</td><td>Text</td><td>PyTorch</td><td>Tensorflow</td><td></td></tr>
-<tr><td>18</td><td>Language Modelling. Transformers. BERT.</td><td>Text</td><td>PyTorch</td><td>Tensorflow</td><td></td></tr>
+<tr><td>16</td><td>Recurrent Neural Networks</td><td><a href="5-NLP/16-RNN/README.md">Text</a></td><td><a href="5-NLP/16-RNN/RNNPyTorch.ipynb">PyTorch</a></td><td><a href="5-NLP/16-RNN/RNNTF.ipynb">Tensorflow</a></td><td></td></tr>
+<tr><td>17</td><td>Generative Recurrent Networks</td><td><a href="5-NLP/17-GenerativeNetworks/README.md">Text</a></td><td><a href="5-NLP/17-GenerativeNetworks/GenerativePyTorch.md">PyTorch</a></td><td><a href="5-NLP/17-GenerativeNetworks/GenerativeTF.md">Tensorflow</a></td><td></td></tr>
+<tr><td>18</td><td>Transformers. BERT.</td><td><a href="5-NLP/18-Transformers/README.md">Text</a></td><td><a href="5-NLP/18-Transformers/TransformersPyTorch.md">PyTorch</a></td><td><a href="5-NLP/18-Transformers/TransformersTF.md">Tensorflow</a></td><td></td></tr>
 <tr><td>19</td><td>Named Entity Recognition</td><td>Text</td><td>PyTorch</td><td>Tensorflow</td><td></td></tr>
 <tr><td>20</td><td>Text Generation using GPT</td><td>Text</td><td>PyTorch</td><td>Tensorflow</td><td></td></tr>
 <tr><td>VI</td><td colspan="4"><b>Other AI Techniques</b></td><td>PAT</td></tr>