Skip to content
Snippets Groups Projects
Unverified Commit ec9ac00b authored by Jen Looper's avatar Jen Looper Committed by GitHub
Browse files

Merge pull request #79 from CinnamonXI/main

Lesson 18 & 21
parents 09c0211e 20a28677
No related branches found
No related tags found
No related merge requests found
......@@ -12,6 +12,8 @@ import x13 from "./lesson-13.json";
import x14 from "./lesson-14.json";
import x16 from "./lesson-16.json";
import x17 from "./lesson-17.json";
import x18 from "./lesson-18.json";
import x21 from "./lesson-21.json";
import x23 from "./lesson-23.json";
const quiz = { 0 : x1[0], 1 : x2[0], 2 : x3[0], 3 : x4[0], 4 : x5[0], 5 : x7[0], 6 : x8[0], 7 : x9[0], 8 : x10[0], 9 : x12[0], 10 : x13[0], 11 : x14[0], 12 : x16[0], 13 : x17[0], 14 : x23[0] };
const quiz = { 0 : x1[0], 1 : x2[0], 2 : x3[0], 3 : x4[0], 4 : x5[0], 5 : x7[0], 6 : x8[0], 7 : x9[0], 8 : x10[0], 9 : x12[0], 10 : x13[0], 11 : x14[0], 12 : x16[0], 13 : x17[0], 14 : x18[0], 15 : x21[0], 16 : x23[0] };
export default quiz;
\ No newline at end of file
[
{
"title": "AI for Beginners: Quizzes",
"complete": "Congratulations, you completed the quiz!",
"error": "Sorry, try again",
"quizzes": [
{
"id": 118,
"title": "Transformers: Pre Quiz",
"quiz": [
{
"questionText": "Attention mechanism provides a means of _____ the imoact of an inout vector on an output prediction of RNN",
"answerOptions": [
{
"answerText": "weighting",
"isCorrect": true
},
{
"answerText": "training",
"isCorrect": false
},
{
"answerText": "testing",
"isCorrect": false
}
]
},
{
"questionText": "BERT is an acronym for",
"answerOptions": [
{
"answerText": "Bidirectional Encoded Representations From Transformers",
"isCorrect": false
},
{
"answerText": "Bidirectional Encoder Representations From Transformers",
"isCorrect": true
},
{
"answerText": "Bidirectional Encoder Representatives of Transformers",
"isCorrect": false
}
]
},
{
"questionText": "In positional encoding the relative position of the token is represented by number of steps",
"answerOptions": [
{
"answerText": "true",
"isCorrect": true
},
{
"answerText": "false",
"isCorrect": false
}
]
}
]
},
{
"id": 218,
"title": "Transformers: Post Quiz",
"quiz": [
{
"questionText": "Positional embedding _____ the original token and its position within the sequence",
"answerOptions": [
{
"answerText": "seperates",
"isCorrect": false
},
{
"answerText": "compares",
"isCorrect": false
},
{
"answerText": "embeds",
"isCorrect": true
}
]
},
{
"questionText": "Multi-Head Attention is used in transformers to give network the power to capture _____ of dependencies",
"answerOptions": [
{
"answerText": "different types",
"isCorrect": true
},
{
"answerText": "same type",
"isCorrect": false
},
{
"answerText": "none",
"isCorrect": false
}
]
},
{
"questionText": "In transformers attention is used in _____ instances",
"answerOptions": [
{
"answerText": "1",
"isCorrect": false
},
{
"answerText": "2",
"isCorrect": true
},
{
"answerText": "3",
"isCorrect": false
}
]
}
]
}
]
}
]
\ No newline at end of file
[
{
"title": "AI for Beginners: Quizzes",
"complete": "Congratulations, you completed the quiz!",
"error": "Sorry, try again",
"quizzes": [
{
"id": 121,
"title": "Genetic Algorithms: Pre Quiz",
"quiz": [
{
"questionText": "Genetic Algorithms are based on which of the following?",
"answerOptions": [
{
"answerText": "mutations",
"isCorrect": false
},
{
"answerText": "Selection",
"isCorrect": false
},
{
"answerText": "both a and b",
"isCorrect": true
}
]
},
{
"questionText": "Crossover allows us to combine two solutions together to obtain a new valid solution",
"answerOptions": [
{
"answerText": "true",
"isCorrect": true
},
{
"answerText": "false",
"isCorrect": false
}
]
},
{
"questionText": "Valid solutions to genetic algorithm can be represented as _____",
"answerOptions": [
{
"answerText": "genes",
"isCorrect": true
},
{
"answerText": "neurons",
"isCorrect": false
},
{
"answerText": "cells",
"isCorrect": false
}
]
}
]
},
{
"id": 221,
"title": "Genetic Algorithms: Post Quiz",
"quiz": [
{
"questionText": "Genetic Algorithms can solve which of these tasks",
"answerOptions": [
{
"answerText": "Schedule optimization",
"isCorrect": false
},
{
"answerText": "Optimal packing",
"isCorrect": false
},
{
"answerText": "both of a and b",
"isCorrect": true
}
]
},
{
"questionText": "In implementing a genetic algorithm the first step is to randomly select two genes",
"answerOptions": [
{
"answerText": "true",
"isCorrect": false
},
{
"answerText": "false",
"isCorrect": true
}
]
},
{
"questionText": "When using Crossover operation the algorithm randomly selects _____ genes",
"answerOptions": [
{
"answerText": "3",
"isCorrect": false
},
{
"answerText": "1",
"isCorrect": false
},
{
"answerText": "2",
"isCorrect": true
}
]
}
]
}
]
}
]
\ No newline at end of file
......@@ -370,6 +370,59 @@ Lesson 17E Generative networks: Post Quiz
+ sequence-to-sequence
- one-to-many
Lesson 18B Transformers: Pre Quiz
* Attention mechanism provides a means of _____ the imoact of an inout vector on an output prediction of RNN
+ weighting
- training
- testing
* BERT is an acronym for
- Bidirectional Encoded Representations From Transformers
+ Bidirectional Encoder Representations From Transformers
- Bidirectional Encoder Representatives of Transformers
* In positional encoding the relative position of the token is represented by number of steps
+ true
- false
Lesson 18E Transformers: Post Quiz
* Positional embedding _____ the original token and its position within the sequence
- seperates
- compares
+ embeds
* Multi-Head Attention is used in transformers to give network the power to capture _____ of dependencies
+ different types
- same type
- none
* In transformers attention is used in _____ instances
- 1
+ 2
- 3
Lesson 21B Genetic Algorithms: Pre Quiz
* Genetic Algorithms are based on which of the following?
- mutations
- Selection
+ both a and b
* Crossover allows us to combine two solutions together to obtain a new valid solution
+ true
- false
* Valid solutions to genetic algorithm can be represented as _____
+ genes
- neurons
- cells
Lesson 21E Genetic Algorithms: Post Quiz
* Genetic Algorithms can solve which of these tasks
- Schedule optimization
- Optimal packing
+ both of a and b
* In implementing a genetic algorithm the first step is to randomly select two genes
- true
+ false
* When using Crossover operation the algorithm randomly selects _____ genes
- 3
- 1
+ 2
Lesson 23B Multi-Agent Modeling: Pre Quiz
* By modeling the behavior of simple agents, we can understand more complex behaviors of a system.
+ true
......
# Language Modeling
Semantic embeddings, such as Word2Vec and GloVe, are in fact a first step towards **language modeling** - creating models that somehow *understand* (or *represent*) the nature of the language.
Semantic embeddings, such as Word2Vec and GloVe, are in fact a first step towards **language modeling** - creating models that somehow *understand* (or *represent*) the nature of the language.
## [Pre-lecture quiz](tbd)
......
......@@ -81,4 +81,4 @@ Generation with Visual Attention](https://arxiv.org/pdf/1502.03044v2.pdf)
- [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Christopher Olah.
## [Assignment: Notebooks](assignment.md)
\ No newline at end of file
## [Assignment: Notebooks](assignment.md)
# Attention mechanisms and transformers
One of the most important problems in NLP domain is **machine translation**. In this section, we will focus on machine translation, or, more generally, on any *sequence-to-sequence* task (which is also called **sentence transduction**).
## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/118)
One of the most important problems in NLP domain is **machine translation**. In this section, we will focus on machine translation, or, more generally, on any *sequence-to-sequence* task (which is also called **sentence transduction**).
With RNNs, sequence-to-sequence is implemented by two recurrent networks, where one network (**encoder**) collapses input sequence into hidden state, and another one, **decoder**, unrolls this hidden state into translated result. There are a couple of problems with this approach:
* Final state of the encoder network would have hard time remembering the beginning of a sentence, thus causing poor quality of the model on long sentences
* All words in a sequence have the same impact on the result. In reality specific words in the input sequence often have more impact on sequential outputs than others.
**Attention Mechanisms** provide means of weighting the contextual impact of each input vector on each output prediction of the RNN. The way it is implemented is by creating shortcuts between intermediate states of the input RNN, and output RNN. In this manner, when generating output symbol y<sub>t</sub>, we will take into account all input hidden states h<sub>i</sub>, with different weight coefficients &alpha;<sub>t,i</sub>.
**Attention Mechanisms** provide means of weighting the contextual impact of each input vector on each output prediction of the RNN. The way it is implemented is by creating shortcuts between intermediate states of the input RNN, and output RNN. In this manner, when generating output symbol y<sub>t</sub>, we will take into account all input hidden states h<sub>i</sub>, with different weight coefficients &alpha;<sub>t,i</sub>.
![Image showing an encoder/decoder model with an additive attention layer](./images/encoder-decoder-attention.png)
......@@ -32,15 +34,15 @@ Adoption of attention mechanisms combined with this constraint led to the creati
One of the main ideas behind transformers is to avoid sequential nature of RNNs, and create a model that is parallelizable during training. This is achieved by implementing two ideas:
* positional encoding
* using self-attention mechanism to capture patterns instead of RNNs (or CNNs) (that is why the paper that introduces transformers is called *[Attention is all you need](https://arxiv.org/abs/1706.03762))
* using self-attention mechanism to capture patterns instead of RNNs (or CNNs) (that is why the paper that introduces transformers is called *[Attention is all you need](https://arxiv.org/abs/1706.03762)*
### Positional Encoding/Embedding
The idea of positional encoding is the following. When using RNNs, the relative position of the tokens is represented by the number of step, and thus does not need to be explicitly represented. However, once we switch to attention, we need to know the relative positions of tokens within a sequence. To get positional encoding, we augment our sequence of tokens with a sequence of token positions in the sequence (i.e., a sequence of numbers 0,1, ...).
The idea of positional encoding is the following. When using RNNs, the relative position of the tokens is represented by the number of step, and thus does not need to be explicitly represented. However, once we switch to attention, we need to know the relative positions of tokens within a sequence. To get positional encoding, we augment our sequence of tokens with a sequence of token positions in the sequence (i.e., a sequence of numbers 0,1, ...).
We then mix the token position with token embedding vector. To transform position (integer) into a vector, we can use different approaches:
* Trainable embedding, similar to token embedding. This is the approach we consider here. We apply embedding layers on top of both tokens and their positions, resulting in embedding vectors of the same dimensions, which we then add together.
* Trainable embedding, similar to token embedding. This is the approach we consider here. We apply embedding layers on top of both tokens and their positions, resulting in embedding vectors of the same dimensions, which we then add together.
* Fixed position encoding function, as proposed in the original paper.
<img src="images/pos-embedding.png" width="50%"/>
......@@ -57,25 +59,26 @@ Next, we need to capture some patterns within our sequence. To do this, transfor
> Image from the [Google Blog](https://research.googleblog.com/2017/08/transformer-novel-neural-network.html)
In transformers, we use **Multi-Head Attention**, in order to give network the power to capture several different types of dependencies, eg. long-term vs. short-term word relations, co-reference vs. something else, etc.
In transformers, we use **Multi-Head Attention**, in order to give network the power to capture several different types of dependencies, eg. long-term vs. short-term word relations, co-reference vs. something else, etc.
[TensorFlow Notebook](TransformersTF.ipynb) contains more detains on the implementation of transformer layers.
### Encoder-Decoder Attention
In transformers, attention is used in two places:
* To capture patterns within the input text using self-attention
* To perform sequence translation - it is the attention layer between encoder and decoder.
Encoder-decoder attention is very similar to the attention mechanism used in RNNs, as described in the beginning of this section. This animated diagram explains the role of encoder-decoder attention.
![Animated GIF showing how the evaluations are performed in transformer models.](./images/transformer-animated-explanation.gif)
![Animated GIF showing how the evaluations are performed in transformer models.](./images/transformer-animated-explanation.gif)
Since each input position is mapped independently to each output position, transformers can parallelize better than RNNs, which enables much larger and more expressive language models. Each attention head can be used to learn different relationships between words that improves downstream Natural Language Processing tasks.
## BERT
**BERT** (Bidirectional Encoder Representations from Transformers) is a very large multi layer transformer network with 12 layers for *BERT-base*, and 24 for *BERT-large*. The model is first pre-trained on large corpus of text data (WikiPedia + books) using unsupervised training (predicting masked words in a sentence). During pre-training the model absorbs significant level of language understanding which can then be leveraged with other datasets using fine tuning. This process is called **transfer learning**.
**BERT** (Bidirectional Encoder Representations from Transformers) is a very large multi layer transformer network with 12 layers for *BERT-base*, and 24 for *BERT-large*. The model is first pre-trained on large corpus of text data (WikiPedia + books) using unsupervised training (predicting masked words in a sentence). During pre-training the model absorbs significant level of language understanding which can then be leveraged with other datasets using fine tuning. This process is called **transfer learning**.
![picture from http://jalammar.github.io/illustrated-bert/](images/jalammarBERT-language-modeling-masked-lm.png)
......@@ -88,7 +91,11 @@ There are many variations of Transformer architectures including BERT, DistilBER
* [Transformers in PyTorch](TransformersPyTorch.ipynb)
* [Transformers in TensorFlow](TransformersTF.ipynb)
## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/218)
## Related materials
* [Blog post](https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/), explaining the classical [Attention is all you need](https://arxiv.org/abs/1706.03762) paper on transformers
* [Blog post](https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/), explaining the classical [Attention is all you need](https://arxiv.org/abs/1706.03762) paper on transformers.
* [A series of blog posts](https://towardsdatascience.com/transformers-explained-visually-part-1-overview-of-functionality-95a6dd460452) on transformers, explaining the architecture in detail.
> ✅ Todo: conclusion, Assignment, challenge.
# Genetic Algorithms
## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/121)
**Genetic Algorithms** (GA) are based on **evolutionary approach** to AI, in which methods of evolution of population is used to obtain an optimal solution for a given problem. They were proposed in 1975 by [John Henry Holland](https://en.wikipedia.org/wiki/John_Henry_Holland).
Genetic Algorithms are based on the following ideas:
......@@ -7,7 +9,7 @@ Genetic Algorithms are based on the following ideas:
* Valid solutions to the problem can be represented as **genes**
* **Crossover** allows us to combine two solutions together to obtain new valid solution
* **Selection** is used to select more optimal solutions using some **fitness function**
* **Mutations** are introduced to destabilize optimization and get us out of the local minimum
* **Mutations** are introduced to destabilize optimization and get us out of the local minimum
If you want to implement a Genetic Algorithm, you need the following:
......@@ -20,7 +22,7 @@ In many cases, crossover and mutation are quite simple algorithms to manipulate
Specific implementation of a genetic algorithm can vary from case to case, but overall structure is the following:
1. Select initial population G&subset;&Gamma;
2. Randomly select one of the operations that will be performed at this step: crossover or mutation
2. Randomly select one of the operations that will be performed at this step: crossover or mutation
3. **Crossover**:
* Randomly select two genes g<sub>1</sub>, g<sub>2</sub> &in; G
* Compute crossover g=crossover(g<sub>1</sub>,g<sub>2</sub>)
......@@ -31,12 +33,12 @@ Specific implementation of a genetic algorithm can vary from case to case, but o
## Typical Tasks
Tasks typically solved by GA:
1. Schedule optimization
1. Optimal packing
1. Optimal cutting
1. Speeding up exhaustive search
## Notebooks
Go to [Genetic.ipynb](Genetic.ipynb) notebooks to see two examples of using Genetic Algorithms:
......@@ -44,14 +46,19 @@ Go to [Genetic.ipynb](Genetic.ipynb) notebooks to see two examples of using Gene
1. Fair division of treasure
1. 8 Queen Problem
## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/221)
## Assignment
Your goal is to solve so-called **Diophantine equation** - an equation with integer roots. For example, consider the equation a+2b+3c+4d=30. You need to find integer roots that satisfy this equation.
Hints:
1. You can consider roots to be in the interval [0;30]
1. As a gene, consider using the list of root values
Use [Diophantine.ipynb](Diophantine.ipynb) as a starting point.
*This assignment is inspired by [this post](https://habr.com/post/128704/).*
> ✅ Todo: conclusion, challenge, reference.
......@@ -20,6 +20,7 @@ Central to Multi-agent approach is the notion of **Agent** - an entity that live
- **Cognitive agents** involve complex planning and reasoning
Multi-agent systems are nowadays used in a number of applications:
* In games, many non-player characters employ some sort of AI, and can be considered to be intelligent agents
* In video production, rendering complex 3D scenes that involve crowds is typically done using multi-agent simulation
* In systems modeling, multi-agent approach is used to simulate the behavior of a complex model. For example, multi-agent approach has been successfully used to predict the spread of COVID-19 disease worldwide. Similar approach can be used to model traffic in the city, and see how it reacts to changes in traffic rules.
......@@ -103,6 +104,7 @@ You can run the flocking example and observe the behavior. You can also adjust p
### Other Models to see
There are a few more interesting models that you can experiment with:
* **Art &rightarrow; Fireworks** shows how a firework can be considered a collective behavior of individual fire streams
* **Social Science &rightarrow; Traffic Basic** and **Social Science &rightarrow; Traffic Grid** show the model of city traffic in 1D and 2D Grid with or without traffic lights. Each car in the simulation follows the following rules:
- If the space in front of it is empty - accelerate (up to a certain max speed)
......@@ -148,4 +150,3 @@ Take this lesson to the real world and try to conceptualize a multi-agent system
Review the use of this type of system in industry. Pick a domain such as manufacturing or the video game industry and discover how multi-agent systems can be used to solve unique problems.
## [NetLogo Assignment](assignment.md)
......@@ -15,9 +15,9 @@ To avoid this accidental or purposeful misuse of AI, Microsoft states the import
* **Fairness** is related to the important problem of *model biases*, which can be caused by using biased data for training. For example, when we try to predict the probability of getting a software developer job for a person, the model is likely to give higher preference to males - just because the training dataset was likely biased towards a male audience. We need to carefully balance training data and investigate the model to avoid biases, and make sure that the model takes into account more relevant features.
* **Reliability and Safety**. By their nature, AI models can make mistakes. A neural network returns probabilities, and we need to take it into account when making decisions. Every model has some precision and recall, and we need to understand that to prevent harm that wrong advice can cause.
* **Privacy and Security** have some AI-specific implications. For example, when we use some data for training a model, this data becomes somehow "integrated" into the model. On one hand, that increases security and privacy, on the other - we need to remember which data the model was trained on.
* **Inclusiveness** means that we are not building AI to replace people, but rather to augment people and make our work more creative. It is also related to fairness, because when dealing with underrepresented communities, most of the datasets we collect are likely to be biased, and we need to make sure that those communities are included and correctly handled by AI.
* **Transparency**. This includes making sure that we are always clear about AI being used. Also, wherever possible, we want to use AI systems that are *interpretable*.
* **Accountability**. When AI models come up with some decisions, it is not always clear who is responsible for those decisions. We need to make sure that we understand where responsibility of AI decisions lies. In most cases we would want to include human beings into the loop of making important decisions, so that actual people are made accountable.
* **Inclusiveness** means that we are not building AI to replace people, but rather to augment people and make our work more creative. It is also related to fairness, because when dealing with underrepresented communities, most of the datasets we collect are likely to be biased, and we need to make sure that those communities are included and correctly handled by AI.
* **Transparency**. This includes making sure that we are always clear about AI being used. Also, wherever possible, we want to use AI systems that are *interpretable*.
* **Accountability**. When AI models come up with some decisions, it is not always clear who is responsible for those decisions. We need to make sure that we understand where responsibility of AI decisions lies. In most cases we would want to include human beings into the loop of making important decisions, so that actual people are made accountable.
## Tools for Responsible AI
......@@ -27,6 +27,7 @@ Microsoft has developed the [Responsible AI Toolbox](https://github.com/microsof
* Fairness Dashboard (FairLearn)
* Error Analysis Dashboard
* Responsible AI Dashboard that includes
- EconML - tool for Causal Analysis, which focuses on what-if questions
- DiCE - tool for Counterfactual Analysis allows you to see which features need to be changed to affect the decision of the model
......@@ -36,4 +37,4 @@ For more information about AI Ethics, please visit [this lesson](https://github.
Take this [Learn Path](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-57639-dmitryso) to learn more about responsible AI.
## [Post-lecture quiz](https://white-water-09ec41f0f.azurestaticapps.net/quiz/6/)
\ No newline at end of file
## [Post-lecture quiz](https://white-water-09ec41f0f.azurestaticapps.net/quiz/6/)
......@@ -3,4 +3,3 @@
After the success of transformer models for solving NLP tasks, there were many attempts to apply the same or similar architectures to computer vision tasks. Also, there is a growing interest in building models that would *combine* vision and natural language capabilities. One of such attempts was done by OpenAI, which is called CLIP.
## Contrastive Image Pre-Training (CLIP)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment