spce0038-machine-learning-w.../week1/slides/.ipynb_checkpoints/Lecture03_Scikit-Learn-checkpoint.ipynb

2088 lines
455 KiB
Plaintext
Raw Normal View History

2025-01-24 13:21:11 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Lecture 3: Introduction to Scikit-Learn"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](https://www.tensorflow.org/images/colab_logo_32px.png)\n",
"[Run in colab](https://colab.research.google.com/drive/1TZW7xcheEHt7DdDraOZUiSG92rqF3TGF)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:23.213712Z",
"iopub.status.busy": "2024-01-10T00:13:23.213476Z",
"iopub.status.idle": "2024-01-10T00:13:23.223868Z",
"shell.execute_reply": "2024-01-10T00:13:23.223286Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Last executed: 2024-01-10 00:13:23\n"
]
}
],
"source": [
"import datetime\n",
"now = datetime.datetime.now()\n",
"print(\"Last executed: \" + now.strftime(\"%Y-%m-%d %H:%M:%S\"))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Scikit-Learn overview\n",
"\n",
"[Scikit-Learn](http://scikit-learn.org/stable/) is an extremely popular python machine learning package.\n",
"\n",
"Provides implementations of a number of different machine learning algorithms."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- Clean, uniform and streamlined API.\n",
"- Useful and complete online documentation.\n",
"- Straightforward to switch models or algorithms."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Two main general concepts:\n",
"- Data representation\n",
"- Estimator API"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Data representations"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Scikit-Learn includes a number of example data-sets"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:23.262339Z",
"iopub.status.busy": "2024-01-10T00:13:23.261807Z",
"iopub.status.idle": "2024-01-10T00:13:23.802820Z",
"shell.execute_reply": "2024-01-10T00:13:23.802100Z"
}
},
"outputs": [],
"source": [
"from sklearn import datasets"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:23.806483Z",
"iopub.status.busy": "2024-01-10T00:13:23.805791Z",
"iopub.status.idle": "2024-01-10T00:13:23.810230Z",
"shell.execute_reply": "2024-01-10T00:13:23.809598Z"
}
},
"outputs": [],
"source": [
"# Type datasets.<TAB> to see more\n",
"#datasets."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Data as a table\n",
"\n",
"Best way to think about data in Scikit-Learn is in terms of tables of data.\n",
"\n",
"Using the [`seaborn`](http://seaborn.pydata.org/) library we can read example data-sets as a Pandas `DataFrame`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:23.813758Z",
"iopub.status.busy": "2024-01-10T00:13:23.813178Z",
"iopub.status.idle": "2024-01-10T00:13:25.297828Z",
"shell.execute_reply": "2024-01-10T00:13:25.297118Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import seaborn as sns\n",
"iris = sns.load_dataset('iris')\n",
"type(iris)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.301227Z",
"iopub.status.busy": "2024-01-10T00:13:25.300607Z",
"iopub.status.idle": "2024-01-10T00:13:25.313145Z",
"shell.execute_reply": "2024-01-10T00:13:25.312527Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"0 5.1 3.5 1.4 0.2 setosa\n",
"1 4.9 3.0 1.4 0.2 setosa\n",
"2 4.7 3.2 1.3 0.2 setosa\n",
"3 4.6 3.1 1.5 0.2 setosa\n",
"4 5.0 3.6 1.4 0.2 setosa"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Iris data\n",
"\n",
"Here we consider the [Iris flower data](https://en.wikipedia.org/wiki/Iris_flower_data_set).\n",
"\n",
"- Introduced by statistician and biologist Ronald Fisher in 1936 paper.\n",
"\n",
"- Consists of 50 samples of three different species of Iris (Iris Setosa, Iris Virginica and Iris Versicolor).\n",
"\n",
"- Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. \n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.316231Z",
"iopub.status.busy": "2024-01-10T00:13:25.315790Z",
"iopub.status.idle": "2024-01-10T00:13:25.327069Z",
"shell.execute_reply": "2024-01-10T00:13:25.326456Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"145 6.7 3.0 5.2 2.3 virginica\n",
"146 6.3 2.5 5.0 1.9 virginica\n",
"147 6.5 3.0 5.2 2.0 virginica\n",
"148 6.2 3.4 5.4 2.3 virginica\n",
"149 5.9 3.0 5.1 1.8 virginica"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Parts of a flower\n",
"\n",
"Measured flower [petals](https://en.wikipedia.org/wiki/Petal) and [sepals](https://en.wikipedia.org/wiki/Sepal).\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture03_Images/Mature_flower_diagram.png\" width=\"1000px\" style=\"display:block; margin:auto\"/>\n",
"\n",
"[Image credit: [Mariana Ruiz](https://en.wikipedia.org/wiki/Sepal#/media/File:Mature_flower_diagram.svg)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Images of different species\n",
"\n",
"<!--\n",
"<table border=\"0\" cellpadding=\"0\">\n",
" <tr>\n",
" <td><center><img src=\"https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture03_Images/iris_setosa.jpg\" width=\"60%\"/></center></td>\n",
" <td><center><img src=\"https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture03_Images/iris_versicolor.jpg\" width=\"70%\"/></center></td>\n",
" <td><center><img src=\"https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture03_Images/iris_virginica.jpg\" width=\"50%\"/></center></td>\n",
" </tr>\n",
" <tr>\n",
" <td><center>Iris Setosa</center></td>\n",
" <td><center>Iris Versicolor</center></td>\n",
" <td><center>Iris Virginica</center></td> \n",
" </tr>\n",
"</table>\n",
"-->\n",
"\n",
"##### Iris Setosa\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture03_Images/iris_setosa.jpg\" width=\"300\" style=\"display:block; margin:auto\"/>\n",
"\n",
"##### Iris Versicolor\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture03_Images/iris_versicolor.jpg\" width=\"300\" style=\"display:block; margin:auto\"/>\n",
"\n",
"##### Iris Virginica\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture03_Images/iris_virginica.jpg\" width=\"300\" style=\"display:block; margin:auto\"/>\n",
"\n",
"[[Image source](https://github.com/jakevdp/sklearn_tutorial)]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Features matrix\n",
"\n",
"Recall data represented to learning algorithm as \"*features*\".\n",
"\n",
"Each row corresponds to an observed (*sampled*) flower, with a number of *features*."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.330360Z",
"iopub.status.busy": "2024-01-10T00:13:25.329898Z",
"iopub.status.idle": "2024-01-10T00:13:25.341334Z",
"shell.execute_reply": "2024-01-10T00:13:25.340725Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"0 5.1 3.5 1.4 0.2 setosa\n",
"1 4.9 3.0 1.4 0.2 setosa\n",
"2 4.7 3.2 1.3 0.2 setosa\n",
"3 4.6 3.1 1.5 0.2 setosa\n",
"4 5.0 3.6 1.4 0.2 setosa"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"In this example we extract a feature matrix, removing species (which we want to predict)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.344521Z",
"iopub.status.busy": "2024-01-10T00:13:25.343963Z",
"iopub.status.idle": "2024-01-10T00:13:25.356078Z",
"shell.execute_reply": "2024-01-10T00:13:25.355443Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width\n",
"0 5.1 3.5 1.4 0.2\n",
"1 4.9 3.0 1.4 0.2\n",
"2 4.7 3.2 1.3 0.2\n",
"3 4.6 3.1 1.5 0.2\n",
"4 5.0 3.6 1.4 0.2"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_iris = iris.drop('species', axis='columns')\n",
"X_iris.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.358964Z",
"iopub.status.busy": "2024-01-10T00:13:25.358716Z",
"iopub.status.idle": "2024-01-10T00:13:25.365488Z",
"shell.execute_reply": "2024-01-10T00:13:25.364851Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(X_iris)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Target array\n",
"\n",
"Consider 1D *target array* containing labels or targets that we want to predict.\n",
"\n",
"May be numerical values or discrete classes/labels."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"In this example we want to predict the flower species from other measurements."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.368660Z",
"iopub.status.busy": "2024-01-10T00:13:25.368188Z",
"iopub.status.idle": "2024-01-10T00:13:25.374927Z",
"shell.execute_reply": "2024-01-10T00:13:25.374241Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 setosa\n",
"1 setosa\n",
"2 setosa\n",
"3 setosa\n",
"4 setosa\n",
"Name: species, dtype: object"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_iris = iris['species']\n",
"y_iris.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.377983Z",
"iopub.status.busy": "2024-01-10T00:13:25.377595Z",
"iopub.status.idle": "2024-01-10T00:13:25.381761Z",
"shell.execute_reply": "2024-01-10T00:13:25.381237Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(y_iris)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Features matrix and target vector\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture03_Images/data-layout.png\" alt=\"data-layout\" width=\"500\" style=\"display:block; margin:auto\"/>\n",
"\n",
"[[Image source](https://github.com/jakevdp/sklearn_tutorial)]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.384689Z",
"iopub.status.busy": "2024-01-10T00:13:25.384254Z",
"iopub.status.idle": "2024-01-10T00:13:25.390554Z",
"shell.execute_reply": "2024-01-10T00:13:25.390008Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(150, 4)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_iris.shape"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.393600Z",
"iopub.status.busy": "2024-01-10T00:13:25.393051Z",
"iopub.status.idle": "2024-01-10T00:13:25.399664Z",
"shell.execute_reply": "2024-01-10T00:13:25.399032Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(150,)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_iris.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Visualizing the data"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:25.402904Z",
"iopub.status.busy": "2024-01-10T00:13:25.402329Z",
"iopub.status.idle": "2024-01-10T00:13:29.947973Z",
"shell.execute_reply": "2024-01-10T00:13:29.947247Z"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAtYAAAJPCAYAAABYeZNNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydd3gc1dm37ynbi3q3LMm2XOTeezc2vRNqaCGBhBISkhdI8gWSN42QhISXQEIIvVdjMAbjBu69d0uybMnqZXe1fcr3x9qyhVbGNrIlm7lz5cI6Z+bMmZmzM7855ymCrus6BgYGBgYGBgYGBgbfCLGzO2BgYGBgYGBgYGBwLmAIawMDAwMDAwMDA4MOwBDWBgYGBgYGBgYGBh2AIawNDAwMDAwMDAwMOgBDWBsYGBgYGBgYGBh0AIawNjAwMDAwMDAwMOgADGFtYGBgYGBgYGBg0AEYwtrAwMDAwMDAwMCgAzCEtYGBgYGBgYGBgUEHcM4I64ULF3LNNdcwdOhQJkyYwI9//GMOHjzY2d0yMDAwMDAwMDD4liCcCynNV69eza233srll1/OJZdcQlNTE//4xz/QNI2PPvoIq9Xa2V00MDAwMDAwMDA4x5E7uwMdwdy5c8nOzuYPf/gDgiAAkJyczC233MK2bdsYMWJEJ/fQwMDAwMDAwMDgXOecENaKouBwOFpENYDL5QLgm0zIq6pGQ4P/G/fvZBFFgeRkBw0NfjTtrF9QALrOOaWluU56n44aB13lGnQWXen8T2UcQOc9E6BrXb9j6ar9gq/vW0ePg658LboKXfUanepYMDD4KueEjfWVV15JcXExr732Gj6fj4MHD/K3v/2NoqIihg0b1tndO2lEUUAQBERR+PqNzxLOxXM6Wc71ayBJIqopgmIKg0lt9aEL5/75n2666vXr6H4JAggmDcUURjVFkORTb/dMX7Oueo+6AoIgoJtUonIIb9iHLJ8T8sPAoA3nxIz1iBEjeOqpp3jggQf47W9/C0C/fv147rnnkCTpG7XdGT9+SRJb/fdc4Gw/p44YB2f7NTgeihxhc+0uZu/6jKagh14p+Vw34FJSzCkIWuw3eK6cf2cJgq56/Tq0X6KOV2vinW1z2VqzC6fJzvmF0xiTMxRZsXRu375CvHHQVe9RZ6OJKrXhGt7c8CGljQdJtidyZb8L6JtciKyaO7t7BgYdyjnhvLhhwwbuvPNOrrrqKqZMmUJTUxNPP/00sizz+uuvn7Lzoq7rbWbdDL59GOPg+DRH/Ly66QMWlS5vVS4g8KvJ9zEws28n9azjMcbC6WV/40F+seDPKJrSqnxAel/uG3sbiVZ3J/WsNcY4OHF0XWf9oa38edkzbeou6j2Na/pfjN1s64SeGRicHs6JGevf/e53jBkzhoceeqilbMiQIUyZMoUPP/yQa6+99pTa1TQdrzfQUd08YSRJxO224fUGUVXtjB//dNBVzikpyXHS+3TUOOgq16Cj8dLURlQD6Oj8Z/3r/GrijzGp1i51/qcyDqDzngnQdcdPR/VLk6K8tPHdNqIaYFvNLiq9NRCSOJmpoK/rW0ePg656jzqTiBTiufVvxK37ZM9iphdMJOzv/Gt1qmPBwOCrnBPCuri4mOnTp7cqy8zMJCkpiQMHDnyjthWlc37wUUVFVbVOO/7p4mw9p47s89l6DeIhigLFTWXt1lc11xJUwgjHLPee7eff2X3vqtfvm/ZLEaJsr93Tbv36Q1u5OD8bRVHPeN/icbz2uuo96gwCWpCGYFPcOh2dg55DFDp6f6NAAwYGXYlzwhAsOzubHTt2tCqrqKigsbGRnJycTurVqfPFxgqufngun6xsX7AYGHQVrPLxbV8l4Zx4zBicbgQBk9j+XI9Nthri6yxEEo//+7fIZsC4rwbnDufEG++6665jwYIF/O53v2PFihV88skn3HXXXaSkpHDBBRd0dvdOCk3TefeLYjRN58NlJURPYXbG4NxGFAV0UxTVFEYwaRwx9YxFU9BRTGF0U7SVA5UkC6imMKopjPwNoix8FU3TyU/ohiTGdxLun9Ybs2AkaDJonyPRZATgliFXI7cjrkfkDCQo+hGtKoJZQ20n+oxB18IqWOmZnBe3ziyZyHKmo+sgmY48oyKYzRJYoiiWIFiihjOowVnFOWEKcvPNN2M2m3njjTd47733cDgcDBkyhL///e8kJSV1dvdOitJKL57mCJdN6sGHX5awp9xD//zkzu6WQRdBNyns85Uxe9en1Aca6ZmUx1VFF5FoTqRZ9fHB9nnsrNtHgtXNpb3Po29KIaqusuTgapbsX4mu64zvPpLpBeMxRW0dEkfWpNn40cibeWr1i+jHzDy5LS5uG3YtoiK3KjcwOIJmibC+aivz9i0hpIQYnjWQX0y6hxc3vsMBT0XLdt8ZcDHrDm3hs31f0CMpl/MLp7Lq4AZq/PVc0/9iUk2poHyzCFAGpwdJNXPn8Jv4zRdP4I8ctUsXBIF7R9+GWbMRNgf4aN8S1lRs4tYh12AzW3l/x6cc8lWT7UrnyqILyLJlIkRMnXgmBgYnxjkRFeR00RnJID5ZVcac5aX8v9tG88eX1zJjeDcun9jjjPbhdCDLIklJDhob/Z1qe9iZCWK+8TWQVeaXLebD3fNbFac7UvjRqFv43y/+gaq1XuH4zdSf8szaV6hqrm1VnmJP4pFJP0GKfPPZZF1W2NqwnSRbAhsrt9MY9NAzOY/chGwSLC5SxDRUVe8yYwDOzgQxXen6Hcup9kszR3hm/UvsqN3bqtxmsvLo1J8yZ9fnOEw2xuQOY23FZubtXdyyjYDAj0bfzMe7F1LWVM69o2+jv7sIVW39Ovu6vnX0OOiq96gzkSSRqughgmqI/Y0HKW08SIo9icGZRXhCXvqm9uKXC/9Mc8TPZX1nkmpP5r8b3mzTzu3DrmVU+nDUyOnpp5EgxqCj6PQZa7/fj9frjWs7l52d3Qk96lxKDnnISnEgSSJZyXZKK72d3SWDLkJYCDFn9+dtymf0nMhLm95pI6p7JHVnZ92+NqIaoD7QyIry9UzJnoCqfLNv6xBB/r3uNSRBZEBGX5xmBysPrueNrR+S487kobH3IBqxag2OQRAEqgI1bUQ1QDAaYs6uz7llwDUoYpT7P/0NEaW1mtLReWvrR1xVdAH/Xvcaz298iz9MewhZNcyOuhoRMcg/17xIbaCBHkndyXJlUO6t5LN9X6Cj89tpD9AciX2kjOs+gkcW/TVuO69vmc2g8/phwn4mu29gcNJ0irAOh8M89dRTvPvuuzQ1NbW73c6dO89cp7oIZVXN9MqJxWpNT7Kxs6yxk3tk0BUQRYEDnoq4JhWZzjRKGw+2KS9KL2Rj5bZ221x+cC3jc0YhcurLq6IoUNywHwBV19hc9RUnYm8VIS2MHUNYGxzFZJJYUbKu3fq1FZu4rv+lFNfvbyOqj1AXaMBhjoksfyRAc9RPIoaw7moE1RC1gQYAShoPUNLYOlJXWVM5CVY3npCXYDRESAnHbSekhPGGfaQYwtqgi9MpwvrRRx9l9uzZzJgxg+HDh5OQkNAZ3ehyBMMK9d4QYwdkAJCaaKNpezXBsILN0umLCwadiK7TbsQEQRAQENqIblXTkMX2RXOsvW/u+GWSji/MRSMqiEEbdMzHGTcm0YSG1q5T7BGOHVuSIBrBJbogX/f7N4mmltU28WscUUXjHhucBXSKWvv888+55pprWtKPG8Q4VB9bDktLiGWhSnHHwphVNQQoyOoaGccMOgdd18lxZyGLcpsEGnvqShmQ0Yet1btalW84tJWL+85ge83uuG1OLRiLVbQQ4dQjz2iaTkFid0RBRNPb2pT2Te2JRTj5VNQG5zbRqMrE7qP5bN8XceunFIxhY+U2+mf0xiSZiKrRNtvkJea0mDml2pOxy3Y4Tfa3BqeOVbSSl9iNsqbyNnUmUSbbndFiCmKSTC2z118lweLCZXZC/AltA4MuQ6dMJQmCQFFRUWccuktTVR/zmE5OiAmRJFfsv9WNnZPpzaBrYVIt3DXiJoSvzDKvPLiem4dcjcvibFVe7a+juzuHgeltU4oXphSQbEs
"text/plain": [
"<Figure size 730x600 with 20 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import seaborn as sns; sns.set()\n",
"sns.pairplot(iris, hue='species', height=1.5);"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": [
"inclass_exercise"
]
},
"source": [
"\n",
"How well do you expect classification to perform with these features and why?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": [
"solution",
"inclass_exercise"
]
},
"source": [
"Fairly well since the different classes are reasonably well separated in feature space."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Scikit-Learn's Estimator API"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Scikit-Learn API design principles\n",
"\n",
"- Consistency: All objects share a common interface.\n",
"- Inspection: All specified parameter values exposed as public attributes.\n",
"- Limited object hierarchy: Only algorithms are represented by Python classes; data-sets/parameters represented in standard formats.\n",
"- Composition: Many machine learning tasks can be expressed as sequences of more fundamental algorithms.\n",
"- Sensible defaults: Library defines appropriate default value."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Impact of design principles\n",
"\n",
"- Makes Scikit-Learn easy to use, once the basic principles are understood. \n",
"- Every machine learning algorithm in Scikit-Learn implemented via the Estimator API.\n",
"- Provides a consistent interface for a wide range of machine learning applications."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Typical Scikit-Learn Estimator API steps\n",
"\n",
"1. Choose a class of model (import appropriate estimator class).\n",
"2. Choose model hyperparameters (instantiate class with desired values).\n",
"3. Arrange data into a features matrix and target vector.\n",
"4. Fit the model to data (calling `fit` method of model instance).\n",
"5. Apply model to new data:\n",
" - Supervised learning: often predict targets for unknown data using the `predict` method.\n",
" - For unsupervised learning: often transform or infer properties of the data using the `transform` or `predict` method."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Linear regression as machine learning"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:29.951745Z",
"iopub.status.busy": "2024-01-10T00:13:29.951355Z",
"iopub.status.idle": "2024-01-10T00:13:30.270629Z",
"shell.execute_reply": "2024-01-10T00:13:30.269923Z"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjQAAAGhCAYAAAB2yC5uAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAABAy0lEQVR4nO3de3iU9Z3//1cmISGGGTKjCYpEIBgCkUMQ5SAtWAxFPFRZ2wr0pygiWAER9btSu1i1XNW1a7sVqQpCPVSpp2ILRlQsDaviKRYpKpAaoREW4pLDJIHEZOb+/UETnRwmc09m5p7D83Fde3nlnvu+5+O7WXnxOSYZhmEIAAAghtmsbgAAAEBPEWgAAEDMI9AAAICYR6ABAAAxj0ADAABiHoEGAADEPAINAACIeQQaAAAQ81KsbkAkGIYhr9fc/oE2W5LpZxAa1N461N461N461N46/mpvsyUpKSkp4HclRKDxeg1VVTUEfH9Kik1OZ4bc7mNqafGGsWVoj9pbh9pbh9pbh9pbp7vau1wZSk4OPNAw5AQAAGIegQYAAMQ8U4HmlVde0Y9//GNNnjxZhYWFuuyyy/TCCy+o/fmWzz//vKZPn66RI0fqe9/7nrZt2xbQ+48cOaIlS5ZozJgxGjdunH7605+qvr7eTBMBAEACMhVoHn/8caWnp2v58uV6+OGHNXnyZK1YsUKrV69uu+fll1/WihUrNGPGDK1du1aFhYVavHixdu7c6ffdzc3Nmj9/vvbv368HHnhAd911l958803deuutQf2LAQCAxGFqUvDDDz8sl8vV9vPEiRNVU1Oj3/3ud7rxxhtls9n04IMP6uKLL9bNN98sSZowYYL27dun1atXa+3atV2++9VXX1VZWZmKi4uVm5srSXI4HLruuuu0a9cujRo1Koh/PQAAkAhM9dB8M8y0Gj58uOrr63Xs2DFVVFRo//79mjFjhs89F110kXbs2KGvvvqqy3dv375d+fn5bWFGkiZNmqTMzEyVlJSYaSYAAEgwPV62XVpaqn79+qlPnz4qLS2VJA0ePNjnniFDhqi5uVkVFRUaMmRIp+8pLy/3CTOSlJSUpMGDB6u8vLynzVRKSuDZLTnZ5vNPRA61tw61tw61tw61t06oa9+jQPPBBx+ouLhYt99+uySptrZW0omhom9q/bn188643W7Z7fYO1/v27ev3uUDYbElyOjNMP+dwpPfoexE8am8dam8dam8dam+dUNU+6EBz+PBhLVu2TOPHj9fVV18dksaEi9dryO0+FvD9yck2ORzpcruPy+Nho6VIovbWofbWofbWofbW6a72Dke6qd6boAKN2+3W9ddfr8zMTK1atUo224kv7Nu3rySprq5OWVlZPvd/8/POOByOTpdo19bW6rTTTgummT6C2QHS4/Gyc6RFqL11qL11qL11qL11QlV70wNXjY2NWrhwoerq6vTYY4/5DBO1zoFpP+elvLxcvXr1Uk5OTpfvzc3N7fCcYRj6/PPPO8ytAQAA4eH1GtpzoFrvfHJYew5Ux8w5V6Z6aFpaWnTzzTervLxcTz/9tPr16+fzeU5OjgYNGqQtW7aoqKio7XpxcbEmTpyo1NTULt89efJk/fnPf9b+/fs1aNAgSdKOHTtUU1OjKVOmmGkmAAAIQuneSj2ztUzVdU1t15z2NM0pytPY/GwLW9Y9Uz00d999t7Zt26YbbrhB9fX12rlzZ9v/tS7JXrJkiTZv3qwHH3xQ7777rn72s59p165duvHGG9vec/DgQRUUFOihhx5quzZ9+nTl5eVpyZIl2rZtm4qLi3XHHXfo/PPPZw8aAADCrHRvpVZv3O0TZiSpuq5JqzfuVuneSotaFhhTPTRvvfWWJOm+++7r8Nkbb7yhAQMG6JJLLtHx48e1du1arVmzRoMHD9ZDDz2kMWPGtN1rGIY8Ho/PkQm9evXSY489ppUrV+qWW25RSkqKpk2bpjvuuCPYfzcAABAAr9fQM1vL/N6zYWuZxuRlyWYL/ATsSEoy2h/EFIc8Hq+qqhoCvr/1SPPq6gYmiUUYtbcOtbcOtbcOtT9hz4Fq3b/hb93e9++zx2jYQGdIvrO72rtcGaZWObGTEAAACa6moan7m0zcZwUCDQAACS4zIy2k91mBQAMAQIIbmpMpp91/WHHZ0zQ0JzMyDQoCgQYAgARnsyVpTlGe33tmF+VF7YRgiUADAAAkjc3P1qKZIzr01LjsaVo0c0TU70PT49O2AQBAfBibn60xeVnaV1GjmoYmZWacGGaK5p6ZVgQaAADQxmZLCtnS7EhiyAkAAMQ8Ag0AAIh5BBoAABDzCDQAACDmEWgAAEDMI9AAAICYR6ABAAAxj0ADAABiHoEGAADEPAINAACIeQQaAAAQ8zjLCQAA+PB6jZg7oJJAAwAA2pTurdQzW8tUXdfUds1pT9OcojyNzc+2sGX+MeQEAAAknQgzqzfu9gkzklRd16TVG3erdG9l2zWv19CeA9V655PD2nOgWl6vEenm+qCHBgAAyOs19MzWMr/3bNhapjF5Wfpb2ZdR14tDDw0AANC+ipoOPTPtVdU1afPb+wPuxYkkAg0AAFBNg/8w0+r1Dyr8fr5ha5klw08EGgAAoMyMtIDua2hs8ft5VV2T9lXUhKBF5hBoAACAhuZkymn3H2oyegc29TbQ3p5QItAAAADZbEmaU5Tn955p5wwI6F2B9vaEEoEGAABIksbmZ2vRzBEdempc9jQtmjlCl5w3uNteHJf9xEZ8kcaybQAA0GZsfrbG5GV1uVPwnKI8rd64u8vnZxflWbKrsOlAc+DAAa1bt04fffSRysrKlJubq82bN7d9/sUXX+iCCy7o9NnU1FT9/e9/7/Ld7777rq6++uoO1y+66CL9+te/NttUAAAQBJstScMGOjv9rLUXp/0+NC57mmZbuA+N6UBTVlamkpISjR49Wl6vV4bhuzQrOztbzz77rM81wzA0f/58TZgwIaDvuPfee5Wbm9v2s9PZeVEBAEDkddeLYwXTgWbq1KkqKiqSJC1fvly7d/t2O6WmpqqwsNDn2rvvvqv6+npdcsklAX1HXl6eRo4cabZpAADEhFg8/LE9f704VjAdaGw28/OIN2/erD59+mjq1KmmnwUAIJ7E6uGP0S7sk4Kbm5v12muvadq0aUpLC2wZ14IFC1RTU6OsrCxdfPHFWrp0qXr37t2jdqSkBB7EkpNtPv9E5FB761B761B760S69u/vqex0Qm3rsQFLvj9K5w5LjFAT6tqHPdBs375dNTU1AQ032e12zZ8/X+eee67S0tL0zjvvaP369SovL9ejjz4adBtstiQ5nRmmn3M40oP+TvQMtbcOtbcOtbdOJGrv8Rp65vV9fu/ZsLVMF4wfpOQYG37qiVDVPuyBZtOmTTrllFM0ceLEbu8tKChQQUFB288TJ05Udna27rnnHu3atUujRo0Kqg1eryG3+1jA9ycn2+RwpMvtPi6PxxvUdyI41N461N461N46kaz9p/urdLS20e89/1dzXO9+9IWGD3KFtS3RoLvaOxzppnpvwhpoGhoatG3bNv3gBz9QcnJyUO+YMWOG7rnnHu3evTvoQCNJLS3mf1E9Hm9Qz6HnqL11qL11qL11IlH7o27/Yeab9yXS70Goah/WQcPXX39djY2NuvTSS8P5NQAARL1AjwOw4tiAeBDWQLN582adccYZGj16dNDvePnllyWJZdwAgJgWyOGPVh0bEA9MDzkdP35cJSUlkqSDBw+qvr5eW7ZskSSNGzdOLteJcb+qqirt2LFD119/fafvOXjwoKZNm6Ybb7xRixcvliTddtttGjhwoAoKCtomBT/++OMqKioi0AAAYlrr4Y/ReGxAPDAdaI4ePaqlS5f6XGv9+cknn9T48eMlSa+88opaWlq6HG4yDEMej8dnp+G8vDxt2rRJ69evV3Nzs04//XTdcMMNWrBggdlmAgAQdaL12IB4kGS0P7sgDnk8XlVVNQR8f0qKTU5nhqqrGxJqYlY0oPbWofbWofbWsar28bBTcE91V3uXKyN6VjkBAICOou3YgHhAoAEAIA4keq8PgQY
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"n_samples = 50\n",
"rng = np.random.RandomState(42)\n",
"x = 10 * rng.rand(n_samples)\n",
"y = 2 * x - 1 + rng.randn(n_samples)\n",
"plt.scatter(x, y);"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### 1. Choose a class of model\n",
"\n",
"Every class of model is represented by a Python class."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.274187Z",
"iopub.status.busy": "2024-01-10T00:13:30.273534Z",
"iopub.status.idle": "2024-01-10T00:13:30.331758Z",
"shell.execute_reply": "2024-01-10T00:13:30.331059Z"
}
},
"outputs": [],
"source": [
"from sklearn.linear_model import LinearRegression"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### 2. Choose model hyperparameters\n",
"\n",
"Make instance of model with defined hyperparameters (e.g. y-intersect, regularization)."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.335535Z",
"iopub.status.busy": "2024-01-10T00:13:30.335036Z",
"iopub.status.idle": "2024-01-10T00:13:30.343050Z",
"shell.execute_reply": "2024-01-10T00:13:30.342450Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-1 {color: black;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-r
],
"text/plain": [
"LinearRegression()"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = LinearRegression(fit_intercept=True)\n",
"model"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### 3. Arrange data into a features matrix and target vector"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.347095Z",
"iopub.status.busy": "2024-01-10T00:13:30.345723Z",
"iopub.status.idle": "2024-01-10T00:13:30.352782Z",
"shell.execute_reply": "2024-01-10T00:13:30.352163Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(50, 1)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = x.reshape(n_samples,1)\n",
"X.shape"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.355875Z",
"iopub.status.busy": "2024-01-10T00:13:30.355272Z",
"iopub.status.idle": "2024-01-10T00:13:30.361812Z",
"shell.execute_reply": "2024-01-10T00:13:30.361206Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(50,)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y.shape "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### 4. Fit the model to data\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.364747Z",
"iopub.status.busy": "2024-01-10T00:13:30.364379Z",
"iopub.status.idle": "2024-01-10T00:13:30.371856Z",
"shell.execute_reply": "2024-01-10T00:13:30.371202Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-2 {color: black;}#sk-container-id-2 pre{padding: 0;}#sk-container-id-2 div.sk-toggleable {background-color: white;}#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-2 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-2 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-2 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-2 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-2 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-2 div.sk-item {position: relative;z-index: 1;}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-2 div.sk-item::before, #sk-container-id-2 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-2 div.sk-label-container {text-align: center;}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-2 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-r
],
"text/plain": [
"LinearRegression()"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.fit(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"All model parameters that were learned during the `fit()` process have *trailing underscores*."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.374971Z",
"iopub.status.busy": "2024-01-10T00:13:30.374528Z",
"iopub.status.idle": "2024-01-10T00:13:30.378642Z",
"shell.execute_reply": "2024-01-10T00:13:30.378102Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"-0.9033107255311146"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.intercept_"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.381546Z",
"iopub.status.busy": "2024-01-10T00:13:30.380989Z",
"iopub.status.idle": "2024-01-10T00:13:30.387913Z",
"shell.execute_reply": "2024-01-10T00:13:30.387091Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([1.9776566])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.coef_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Intercept and slope are close to the model used to generate the data (-1 and 2 respectively)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### 5. Predict targets for unknown data"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.391306Z",
"iopub.status.busy": "2024-01-10T00:13:30.390667Z",
"iopub.status.idle": "2024-01-10T00:13:30.396006Z",
"shell.execute_reply": "2024-01-10T00:13:30.395397Z"
}
},
"outputs": [],
"source": [
"n_fit = 50\n",
"xfit = np.linspace(-1, 11, n_fit)\n",
"Xfit = xfit.reshape(n_fit,1)\n",
"yfit = model.predict(Xfit)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.398979Z",
"iopub.status.busy": "2024-01-10T00:13:30.398541Z",
"iopub.status.idle": "2024-01-10T00:13:30.618415Z",
"shell.execute_reply": "2024-01-10T00:13:30.617751Z"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiQAAAGhCAYAAABRZq+GAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAABNLUlEQVR4nO3deWBUVZ73/3dVZV8qC2Rj38ImsogKiKIi7iigooAtirIp2t30+Ou25xmdx+l+pv35TM+MY7MGUBDEHXGJ2OICLrihgohCBMEA2ci+p6ruff5IJ5K9Eiq5leTz+kdTdavqmyMmH+4553tspmmaiIiIiFjIbnUBIiIiIgokIiIiYjkFEhEREbGcAomIiIhYToFERERELKdAIiIiIpZTIBERERHLKZCIiIiI5QKsLsBbpmliGN2rh5vdbut233Nbaay8p7HynsbKexor73WnsbLbbdhsNq+u7TSBxDBM8vJKrS6jwwQE2ImJCaeoqAy327C6HL+msfKexsp7Givvaay8193GKjY2HIfDu0CiKRsRERGxnAKJiIiIWE6BRERERCynQCIiIiKWUyARERERyymQiIiIiOUUSERERMRyCiQiIiJiOQUSERERsZwCiYiIiFhOgUREREQsp0AiIiIillMgEREREcspkIiIiHRzpmlimqalNSiQiIiIdGNl3x/k2B9/T8aaVZbWEWDpp4uIiIglTI+H3Ne3k/fm62CaBCUlWVqPAomIiEg34y7IJ2PtasoPHwIgasplxM2ZZ2lNCiQiIiLdSOmB/WSuS8FTUowtOISEO+/CeeFEq8tSIBEREekOTLeb09u3kf/WmwAE9+1H0tL7CEpItLiyagokIiIiXZwrL5eMNauoOPIjAFGXTyXu1jnYA4MsruwXCiQiIiJdWMm+b8jckIJRWoo9NJSEO+8m8vwLrC6rAQUSERGRLsh0uzn9ykvk/30HAMH9B5C05D6C4uMtrqxxrQokb731Fq+99hrfffcdRUVF9O/fnzvuuIObb74Zm81We92LL77IunXrOHXqFAMHDmT58uVcfvnlPi9eREREGnKdzqmeovnpKADRV1xJz1tuxR4YaHFlTWtVIHn66afp3bs3Dz30EDExMXzyySc8/PDDZGZmcv/99wPw5ptv8vDDD7N06VImTpxIamoq999/P1u2bGHs2LHt8T2IiIjIPxR/tZesp9djlJVhDwsjccE9RIwbb3VZLbKZregVm5eXR2xsbJ3HHn74YVJTU/niiy+w2+1cffXVjBo1ir/+9a+118yZM4fIyEhSUlLaXKjHY5CXV9rm13c2AQF2YmLCyc8vxe02rC7Hr2msvKex8p7GynsaK++151gZLhenX3yegvd2AhAyaBBJS+4jsEdPn35Oa8TGhuNweNcUvlWt4+uHEYARI0ZQUlJCWVkZ6enpHDt2jGuvvbbONddddx179uyhqqqqNR8nIiIiXqjKzib9sf9TG0Zirr6Gvr//Z0vDSGud9aLWvXv3kpCQQEREBHv37gVg4MCBda4ZPHgwLpeL9PR0Bg8e3ObPCgjoPkfv1CRKb5Nld6ax8p7GynsaK+9prLzXHmNV+PlnZGxYj1FRgSMigl4LFxPZCZdInFUg+fLLL0lNTeUPf/gDAIWFhQA4nc4619V8XfN8W9jtNmJiwtv8+s7K6Qy1uoROQ2PlPY2V9zRW3tNYec8XY2VUVfHT+qfI3PF3ACJHDGfYg78juGePs35vK7Q5kGRmZrJ8+XImTJjA/PnzfVlTowzDpKiorN0/x184HHaczlCKisrxeDQn2xyNlfc0Vt7TWHlPY+U9X41VZUYGJ1b+jcr0dAB6TL+B+Fk3UeZwUJbvP+stnc5Qr+8GtSmQFBUVsWjRIqKjo3nyySex26s/LCoqCoDi4mLi4uLqXH/m823VHRdLeTxGt/y+20Jj5T2Nlfc0Vt7rimNlGCaH0wsoKK0kOjyYoX2jsdttLb+wBWczVkWffkLWMxsxKytxREaSeM9iwkedi8cEOvH4tzqQVFRUsGTJEoqLi3n++eeJjIysfW7QoEEAHD16tPbfa74ODAykb9++PihZRESk/e09lM2zO9PIL66sfSwmMph505IZP6zjm4sZlZVkb91C0Ue7AQgdNpykRUsIiI7p8FraQ6tW1bjdbn77299y9OhR1q1bR0JCQp3n+/bty4ABA9ixY0edx1NTU5k0aRJBQf7TM19ERKQpew9ls2LbgTphBCC/uJIV2w6w91B2h9ZTeeokP/+ff6sOIzYbsTfMoM8//b7LhBFo5R2SRx99lPfff5+HHnqIkpISvvnmm9rnRo4cSVBQEA888AAPPvgg/fr1Y8KECaSmprJ//342b97s69pFRER8zjBMnt2Z1uw1W3emMS45zifTNy0p/PhDsrc8g1lVhcPpJGnRUsJGjGz3z+1orQokH3/8MQCPPfZYg+feffdd+vTpw/Tp0ykvLyclJYW1a9cycOBA/va3vzFu3DjfVCwiItKODqcXNLgzUl9ecSWH0wsY3r/97lAYFRVkb3mGoj3Vv3vDRowkceFiAqKi2+0zrdSqQPLee+95dd3s2bOZPXt2mwoSERGxUkFp82Gktde1ReWJdDJWr6QqMwNsNnrMmEXsddOx2bturxed9isiInKG6PBgn17XGqZpUvThbrK3bsZ0uXBER1dP0Qwb7vPP8jcKJCIiImcY2jeamMjgZqdtYiOrtwD7klFRTtamjRR//ikAYeeMqp6iiXS28MquQYFERETkDHa7jXnTklmx7UCT18ydluzTBa0VPx8nY81KXFlZYLfTc9bNxFx9bZeeoqlPgURERKSe8cPiWTZrVIM+JLGRwcz1YR8S0zQp/OB9cp5/FtPtJiAmlqTF9xKanOyT9+9MFEhEREQaMX5YPOOS49qlUyuAp6yMrE1PUfLlFwCEjx5D4t2LcERE+OT9OxsFEhERkSbY7bZ22dpb/tNRTqxcgSsnBxwO4m6eTfSVV2OztX9fE3+lQCIiItJBTNPk1Otv8tNTG8HjIaBHD5KW3EfooMFWl2Y5BRIREZEO4CktJWPjBoq/2gtAxLjxJNx1N47wcIsr8w8KJCIiIu2s/OgRMtasxJ2biy0ggITb5hB52RXdeoqmPgUSERGRdmKaJvl/38HpV14Cj4fAuHhGPvQgrh6JuN2G1eX5FQUSERGRduApKSFzQwql+/cBEHH+hfS++24ieseRn19qcXX+R4FERETEx8rT0shYuwp3fh62gADi5swj6tLLcQQ6rC7NbymQiIiI+IhpGOTvSOX0q6+AYRCYkEivpfcR3Lef1aX5PQUSERGRZhiG6VVzNHdxEZnrUyg78C0AkRMmkXDHfOwhoR1dcqekQCIiItKEvYeyG7SPj4kMZl699vFlh34gI2U1noICbEFBxM+9HefFU7SLphUUSERERBqx91B2owfs5RdXsmLbAZbNGsXYwT1I2/oStt07sJkmgUm9qqdoevexoOLOTYFERESkHsMweXZnWrPXvPjaVxSe3E2fklMA7I8czBdxl3BrSRDjO6LILqb7nGssIiLipcPpBXWmaerrX5bBrT++Sp+SU1TZAngjfjKpCZPJKTNYse0Aew9ld2C1XYPukIiIiNRTUNp4GLGZBhfn7eei/P3YgOygaLYnTiE3KLrOdVt3pjEuOc5nJwN3BwokIiIi9USHBzd4LMJdxo2ZH9KvIguAb5xD2NnzQtz2hr9K84orOZxe0C4nBXdVCiQiIiL1DO0bTUxkcO20zcDSk0zP/phwTwWVtgDejp/IwchBzb5HU3dZpHFaQyIiIlKP3W5j3rRkbKbBpae/4raMdwn3VJAVFMPTfae3GEag8bss0jTdIREREWnE6DgH/1TxEQEFxwDYGzWM93qcj9MZSrjboLTC3eRrYyOrG6iJ9xRIRERE6inZ9w2ZG1IIKC3FHhKK+/pbGdh/BP/0j06tX6flNNqjpMbcacla0NpKCiQiIuLXvG3d7gum283pV14i/+87AAjuP4CkJfcRFB9f57rxw+JZNmtUgy6usZHBzK3XxVW8o0AiIiJ+y9vW7b7gOp1DxtpVVBw
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.scatter(x, y)\n",
"plt.plot(xfit, yfit, 'r');"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Supervised learning: classification\n",
"\n",
"Consider Iris data-set and predict species."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Split data into training and test sets (hint: [`train_test_split`](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) is a convenient scikit-learn function for this task)."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.621937Z",
"iopub.status.busy": "2024-01-10T00:13:30.621464Z",
"iopub.status.idle": "2024-01-10T00:13:30.627694Z",
"shell.execute_reply": "2024-01-10T00:13:30.627089Z"
}
},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X_iris, y_iris, test_size=0.5, random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.630352Z",
"iopub.status.busy": "2024-01-10T00:13:30.630134Z",
"iopub.status.idle": "2024-01-10T00:13:30.641703Z",
"shell.execute_reply": "2024-01-10T00:13:30.641083Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>74</th>\n",
" <td>6.4</td>\n",
" <td>2.9</td>\n",
" <td>4.3</td>\n",
" <td>1.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.5</td>\n",
" <td>1.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>5.0</td>\n",
" <td>2.3</td>\n",
" <td>3.3</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>6.3</td>\n",
" <td>3.3</td>\n",
" <td>6.0</td>\n",
" <td>2.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>5.5</td>\n",
" <td>2.5</td>\n",
" <td>4.0</td>\n",
" <td>1.3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width\n",
"74 6.4 2.9 4.3 1.3\n",
"116 6.5 3.0 5.5 1.8\n",
"93 5.0 2.3 3.3 1.0\n",
"100 6.3 3.3 6.0 2.5\n",
"89 5.5 2.5 4.0 1.3"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.head() "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"\n",
"Use a Gaussian Naive Bayes (`GaussianNB`) model to predict Iris species. Then evaluate performance on test data.\n",
"\n",
"(Hint: choose, instantiate, fit and predict.) \n",
"\n",
"See Scikit-Learn documentation on [`GaussianNB`](http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html).\n",
"\n",
"Evaluate performance using simple [`accuracy_score`](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score).\n",
"\n",
"(Do not set any priors.)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.644869Z",
"iopub.status.busy": "2024-01-10T00:13:30.644398Z",
"iopub.status.idle": "2024-01-10T00:13:30.654902Z",
"shell.execute_reply": "2024-01-10T00:13:30.654270Z"
},
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"from sklearn.naive_bayes import GaussianNB # 1. choose model class\n",
"model = GaussianNB() # 2. instantiate model\n",
"model.fit(X_train, y_train) # 3. fit model to data\n",
"y_model = model.predict(X_test) # 4. predict on new data"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Evaluate performance on test data."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.658096Z",
"iopub.status.busy": "2024-01-10T00:13:30.657663Z",
"iopub.status.idle": "2024-01-10T00:13:30.665168Z",
"shell.execute_reply": "2024-01-10T00:13:30.664579Z"
},
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"text/plain": [
"0.96"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import accuracy_score\n",
"accuracy_score(y_test, y_model)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Unsupervised learning: dimensionality reduction\n",
"\n",
"Reduce dimensionality of Iris data for visualisation or to discover structure.\n",
"\n",
"Recall the original Iris data has four features."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.667996Z",
"iopub.status.busy": "2024-01-10T00:13:30.667639Z",
"iopub.status.idle": "2024-01-10T00:13:30.676274Z",
"shell.execute_reply": "2024-01-10T00:13:30.675749Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width\n",
"0 5.1 3.5 1.4 0.2\n",
"1 4.9 3.0 1.4 0.2\n",
"2 4.7 3.2 1.3 0.2\n",
"3 4.6 3.1 1.5 0.2\n",
"4 5.0 3.6 1.4 0.2"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_iris.head()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.679207Z",
"iopub.status.busy": "2024-01-10T00:13:30.678646Z",
"iopub.status.idle": "2024-01-10T00:13:30.685276Z",
"shell.execute_reply": "2024-01-10T00:13:30.684661Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(150, 4)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_iris.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Compute principle component analysis (`PCA`), with 2 components, and apply transform. Plot data in PCA space. \n",
"\n",
"(Hint: choose, instantiate, fit and transform.)\n",
"\n",
"See Scikit-Learn documentation on [`PCA`](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html).\n",
"\n",
"See Seaborn documentation on [`lmplot`](https://seaborn.pydata.org/generated/seaborn.lmplot.html)."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.688494Z",
"iopub.status.busy": "2024-01-10T00:13:30.687909Z",
"iopub.status.idle": "2024-01-10T00:13:30.703068Z",
"shell.execute_reply": "2024-01-10T00:13:30.702344Z"
},
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA # 1. Choose the model class\n",
"model = PCA(n_components=2) # 2. Instantiate the model with hyperparameters\n",
"model.fit(X_iris) # 3. Fit to data. Notice y is not specified!\n",
"X_2D = model.transform(X_iris) # 4. Transform the data to two dimensions "
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.706164Z",
"iopub.status.busy": "2024-01-10T00:13:30.705697Z",
"iopub.status.idle": "2024-01-10T00:13:30.720075Z",
"shell.execute_reply": "2024-01-10T00:13:30.719309Z"
},
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" <th>species</th>\n",
" <th>PCA1</th>\n",
" <th>PCA2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" <td>-2.684126</td>\n",
" <td>0.319397</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" <td>-2.714142</td>\n",
" <td>-0.177001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" <td>-2.888991</td>\n",
" <td>-0.144949</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" <td>-2.745343</td>\n",
" <td>-0.318299</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" <td>-2.728717</td>\n",
" <td>0.326755</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species PCA1 \\\n",
"0 5.1 3.5 1.4 0.2 setosa -2.684126 \n",
"1 4.9 3.0 1.4 0.2 setosa -2.714142 \n",
"2 4.7 3.2 1.3 0.2 setosa -2.888991 \n",
"3 4.6 3.1 1.5 0.2 setosa -2.745343 \n",
"4 5.0 3.6 1.4 0.2 setosa -2.728717 \n",
"\n",
" PCA2 \n",
"0 0.319397 \n",
"1 -0.177001 \n",
"2 -0.144949 \n",
"3 -0.318299 \n",
"4 0.326755 "
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris['PCA1'] = X_2D[:, 0]\n",
"iris['PCA2'] = X_2D[:, 1]\n",
"iris.head() "
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:30.723401Z",
"iopub.status.busy": "2024-01-10T00:13:30.723006Z",
"iopub.status.idle": "2024-01-10T00:13:31.391220Z",
"shell.execute_reply": "2024-01-10T00:13:31.390526Z"
},
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAHjCAYAAAD/g2H3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAACfsUlEQVR4nOzdeXzcVb34/9dnm5nMJJOlSZs2aZO2QNqyWAGhBaS0QguU+0UEBRRuWYSKKCrw86rXK3i9XrcrKIuyCFpRWQQXaKEtaKkIF7yyY0sEStMmbdM0aTKZmczy+XzO749Jplkm+2R/Px8PHiWTmc+ck2XynnPe5/3WlFIKIYQQQgiRNfpYD0AIIYQQYrKRAEsIIYQQIsskwBJCCCGEyDIJsIQQQgghskwCLCGEEEKILJMASwghhBAiyyTAEkIIIYTIMgmwhBBCCCGyTAIsIYQQQogsM8d6AMNVU1PDfffdx+uvv84777zDvHnzWL9+fb+PW7FiBXV1dT1uf+ONN/B6vSMxVCGEEEJMERM+wHrnnXfYunUrH/jAB3Bdl8F0/lm1ahVXXHFFl9s8Hk+2hyiEEEKIKWbCB1grVqzg9NNPB+ArX/kKb7311oAfW1xczOLFi0doZEIIIYSYqiZ8DpauT/gpCCGEEGKSmfArWMPxxBNP8Mgjj2BZFscffzw33ngjVVVVw7qm47iEQm1ZGmHfNE0jPz+Hlpa2QW2Njicyh/FB5jA+yBzGh/7mUFgYGINRiYlmygZYK1as4JhjjmHWrFns3r2bu+66i09+8pP84Q9/YPbs2UO+rq5ro/7LV1DgH9XnGwkyh/FB5jA+yBzGh8kwBzF2pmyA9fWvfz39/8cffzwnn3wyZ511Fvfddx8333zzkK/ruopQKJqFEfbPMHSCwRxCoTYcxx2V58w2mcP4IHMYH2QO40N/c5AVLDEQUzbA6m769Okcd9xx/OMf/xj2tWx7dF9UHMcd9efMNpnD+CBzGB9kDuPDZJiDGDuSIS6EEEIIkWUSYLWrr6/n5Zdf5uijjx7roQghhBBigpvwW4RtbW1s3boVgLq6OsLhMBs3bgTghBNOoKioiDVr1rBnzx6efvppANavX8+WLVtYtmwZ06dPZ/fu3dxzzz0YhsHll18+ZnMRQgghxOQw4QOsxsZGvvCFL3S5rePjX/7yl5x44om4rovjOOnPl5eXs3//fv77v/+b1tZW8vLyWLJkCdddd92wThAKIYQQQsAkCLDKy8uprq7u8z4PPPBAl48XL17c4zYhhBBCiGyRHCwhhBBCiCyTAEsIIYQQIsskwBJCCCGEyDIJsIQQQgghskwCLCGEEEKILJvwpwiFEEJMPK5yqW3dQzgZIdcKUJ43C12T9/xi8pAASwghxKiqbnqXzTVbqI824CgHQzOY4S9hZcVyqooOG+vhCZEV8nZBCCHEqKluepcHqx+jLrwXr+Eh6MnFa3ioi+zlwerHqG56d6yHKERWSIAlhBBiVLjKZXPNFmJ2nAJvEI9hoWs6HsOiwBMk5sTZXLMFV7ljPVQhhk0CLCGEEKOitnUP9dEGApYfTdO6fE7TNAKmn/poA7Wte8ZohEJkj+RgiUFxlWJXfSvhaJJcv8WcGXno3V4ohRAik3AygqMcTN3I+HlTN4jaDuFkZJRHJkT2SYAlBmz7ziY2vFjDvqYojqMwDI3SIj+rl1SwsLJorIcnhBjncq0AhmZguw4eo+cGiu2mEt5zrcAYjE6I7JItQjEg23c2sW5TNbUNYbyWQTDXg9cyqG2IsG5TNdt3No31EIUQ41x53ixm+EuI2FGUUl0+p5QiYkeZ4S+hPG/WGI1QiOyRAEv0y1WKDS/WEEvYFOR68VgGuqbhsQwKcj3EEg4bXqzB7faCKYQQnemazsqK5fgML82JEAkniatcEk6S5kQIn+FjZcVyqYclJgX5KRb92lXfyr6mKAGflTkx1WeyrynKrvrWMRqhEGKiqCo6jIurzqcsMJO4kyCUCBN3EpQFZnJx1cekDpaYNCQHS/QrHE3iOAozJ3M8bpo60ZhNOJoc5ZEJISaiqqLDOLxwnlRyF5OaBFiiX7l+C8PQsG0Xj9Xz9I9tuxiGRq7fGoPRCSEmIl3TmRMsH+thCDFi5O2C6NecGXmUFvmJxOzMiakxm9IiP3Nm5I3RCIUQQojxRQIs0S9d01i9pAKfx6A5nCCRdHCVIpF0aA4n8HkMVi+pkHpYQgghRDsJsMSALKwsYs2qKspLAsSTDqFwgnjSobwkwJpVVVIHSwghhOhEcrDEgC2sLKKqolAquQshhBD9kABLDIquaVSWBsd6GEIIIcS4JluEQgghhBBZJgGWEEIIIUSWSYAlhBBCCJFlEmAJIYQQQmSZBFhCCCGEEFkmAZYQQgghRJZJgCWEEEIIkWUSYAkhhBBCZJkEWEIIIYQQWSYBlhBCCCFElkmrnCnMVUr6CgohJgxXudS27iGcjJBrBSjPm4WuyTqBGJ8kwJqitu9sYsOLNexriuI4CsPQKC3ys3pJBQsri8Z6eEII0UV107tsrtlCfbQBRzkYmsEMfwkrK5ZTVXTYWA9PiB4k9J+Ctu9sYt2mamobwngtg2CuB69lUNsQYd2marbvbBrrIQohRFp107s8WP0YdeG9eA0PQU8uXsNDXWQvD1Y/RnXTu2M9RCF6kABrinGVYsOLNcQSNgW5XjyWga5peCyDglwPsYTDhhdrcJUa66EKIQSuctlcs4WYHafAG8RjWOiajsewKPAEiTlxNtdswVXuWA9ViC4kwJpidtW3sq8pSsBnoXXLt9I0jYDPZF9TlF31rWM0QiGEOKS2dQ/10QYClj/za5bppz7aQG3rnjEaoRCZSYA1xYSjSRxHYZqZv/WmqeM4inA0OcojE0KInsLJCI5yMHUj4+dN3cBRDuFkZJRHJkTfJMCaYnL9FoahYduZl9Nt28UwNHL91iiPTAghesq1Ahiage06GT9vu6mE91wrMMojE6JvEmBNMXNm5FFa5CcSs1Hd8qyUUkRiNqVFfubMyBujEQohxCHlebOY4S8hYkczv2bZUWb4SyjPmzVGIxQiMwmwphhd01i9pAKfx6A5nCCRdHCVIpF0aA4n8HkMVi+pkHpYQohxQdd0VlYsx2d4aU6ESDhJXOWScJI0J0L4DB8rK5ZLPSwx7shP5BS0sLKINauqKC8JEE86hMIJ4kmH8pIAa1ZVSR0sIcS4UlV0GBdXnU9ZYCZxJ0EoESbuJCgLzOTiqo9JHSwxLkmh0SlqYWURVRWFUsldCDEhVBUdxuGF86SSu5gwJMCawnRNo7I0ONbDEEKIAdE1nTnB8rEehhADIgGWkJ6EQgghRJZJgDXFSU9CIYQQIvtk83oKk56EQgghxMiQAGuKkp6EQgghxMiRAGuKkp6EQgghxMiRAGuKkp6EQgghxMiRAGuKkp6EQgghxMiRAGuKkp6EQgghxMiRAGuKkp6EQgghxMiRAGsKk56EQgghxMiQQqNTnPQkFEIIIbJPAiwhPQmFEEKILJMtQiGEEEKILJvwAVZNTQ3f+MY3OPfcc1m0aBHnnHPOgB6nlOKee+7htNNO45hjjuHCCy/ktddeG9nBCiGEEGJKmPAB1jvvvMPWrVupqKhg/vz5A37cvffey2233cZll13G3XffTUlJCVdccQW7d+8ewdEKIYQQYiqY8AHWihUr2Lp1K7fddhtHHnnkgB4Tj8e5++67ueKKK7jssstYunQpt9xyCwUFBdx3330jPGIhhBBCTHYTPsDS9cFP4ZVXXiEcDnPWWWelb/N4PJxxxhn85S9/yebwhBBCCDEFTclThDt27ABg3rx5XW6fP38+69atIxaL4fP5hnz93vr7ZZth6F3+nYhkDuODzGF8kDmMD5NhDmLsTckAKxQK4fF48Hq9XW4PBoMopWhpaRlygKXrGoWFgWwMc8CCwZxRfb6RIHMYH2QO44PMYXyYDHMQY2dKBlgjyXUVoVB0VJ7LMHSCwRxCoTYcJ3PT5vFO5jA+yBzGB5nD+NDfHEb7TbSYmKZkgBUMBkkkEsTj8S6rWKFQCE3TyM/PH9b1bXt
"text/plain": [
"<Figure size 630x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.lmplot(data=iris, x=\"PCA1\", y=\"PCA2\", hue='species', fit_reg=False);"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"How well do you expect classification to perform using PCA components as features and why?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": [
"solution"
]
},
"source": [
"Very well since the different classes are well separated in PCA feature space."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Unsupervised learning: clustering\n",
"\n",
"Attempt to find \"groups\" in Iris data without given labels or training data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
" \n",
"Cluster Iris data into 3 components using Gaussian Mixture Model (GMM). Plot the 3 components separately in PCA space.\n",
"\n",
"(Hint: choose, instantiate, fit and predict.)\n",
"\n",
"See Scikit-Learn documentation on [`GaussianMixture`](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html)."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:31.394705Z",
"iopub.status.busy": "2024-01-10T00:13:31.394206Z",
"iopub.status.idle": "2024-01-10T00:13:31.493375Z",
"shell.execute_reply": "2024-01-10T00:13:31.492329Z"
},
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"from sklearn.mixture import GaussianMixture # 1. Choose the model class\n",
"model = GaussianMixture(n_components=3) # 2. Instantiate the model with hyperparameters\n",
"model.fit(X_iris) # 3. Fit to data. Notice y is not specified!\n",
"y_gmm = model.predict(X_iris) # 4. Determine cluster labels"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-10T00:13:31.497402Z",
"iopub.status.busy": "2024-01-10T00:13:31.496762Z",
"iopub.status.idle": "2024-01-10T00:13:33.256040Z",
"shell.execute_reply": "2024-01-10T00:13:33.255265Z"
},
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABlAAAAHkCAYAAABBiGI5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAADVGElEQVR4nOzdfXxcdZn//9c5Z2YymZlMk7RpmiZNQm9MCwUKuNCKUFqhBYKL3MiNylZBrC674OquC+t+V91dV1fXqty4CKJWUNAFEaFIC1oB4Vd25R4pkVKSNGkb0g5pMjOZzJyb3x9DQ9OkbdJOMknm/Xw8eJScc+ac65ppc5K55nNdhud5HiIiIiIiIiIiIiIiItLPzHcAIiIiIiIiIiIiIiIi440KKCIiIiIiIiIiIiIiIvtRAUVERERERERERERERGQ/KqCIiIiIiIiIiIiIiIjsRwUUERERERERERERERGR/aiAIiIiIiIiIiIiIiIish8VUERERERERERERERERPajAoqIiIiIiIiIiIiIiMh+VEARERERERERERERERHZjwooInJEnnnmGRoaGnjmmWfyHYqIiMiEpHupiIjIkdG9VERERosKKCIyrj333HPcdNNNdHd35zuUw5ZOp/nmN7/J+9//fo477jg+/OEP89RTT+U7LBERKRAT/V6aSCS48cYbueqqqzj55JNpaGjgl7/8Zb7DEhGRAjLR76UvvfQS//qv/0pjYyOLFi3ijDPO4LrrruPNN9/Md2giIuOeCigiMq49//zz3HzzzRP2B1WA66+/nh//+Md88IMf5Itf/CKWZfGpT32KP/7xj/kOTURECsBEv5e+/fbb3HLLLWzdupWGhoZ8hyMiIgVoot9Lf/CDH7BhwwaWLFnCF7/4RS655BL++Mc/cuGFF/LnP/853+GJiIxrvnwHICKSD729vRQXF4/6dV566SXWrVvHF77wBa666ioAPvShD3HeeefxX//1X9xzzz2jHoOIiMhoGKt76fTp0/nDH/5ARUUFL7/8MhdffPGoX1NERGQsjNW99OMf/zj/9V//RSAQ6N927rnn8sEPfpDbbruN//qv/xr1GEREJiqtQBGRg+ro6OCf/umfeP/738/ChQtZvnw5X/rSl0in0wd8zPLly7n++usHbb/iiiu44oorBmy78847aWxs5Pjjj+cv/uIvuPDCC3nwwQcBuOmmm/jGN74BwAc+8AEaGhpoaGigra2t//EPPPAAF154Iccddxwnn3wyf/d3f8eOHTsGXfe8887jlVde4aMf/SjHH388a9asOeznZCQeeeQRLMvi0ksv7d9WVFTExRdfzPPPPz8oVhERmXx0Lz0ygUCAioqKMbmWiIiMT7qXHpkTTzxxQPEEoL6+nnnz5rF169YxiUFEZKLSChQROaCOjg4uvvhienp6uOSSS5g9ezYdHR2sX7+eVCo16AewkfrFL37Bv//7v7Ny5Ur+6q/+ir6+PpqamnjxxRf54Ac/yFlnnUVzczMPPfQQN9xwA2VlZQCUl5cD8N///d9897vf5ZxzzuHiiy8mFotx11138dGPfpRf/epXRKPR/mt1dXVx9dVX09jYyF/+5V8yderUA8aVTqeJx+PDymFvLAeyefNm6uvriUQiA7Yfd9xx/furqqqGdS0REZl4dC89tEPdS0VEpLDpXnpoh3Mv9TyPXbt2MW/evBE/VkSkkKiAIiIHtGbNGnbt2sUvfvELjj322P7t1113HZ7nHfH5f//73zNv3jxuvPHGIffPnz+fo48+moceeogzzzyTmpqa/n3t7e3cdNNNfPazn+XTn/50//YVK1ZwwQUX8LOf/WzA9s7OTr7yla9w2WWXHTKuvT8YD0dTU9NB93d2dg75qdm92956661hXUdERCYm3UsP7VD3UhERKWy6lx7a4dxLf/3rX9PR0cG111474seKiBQSFVBEZEiu6/LYY4+xbNmyAT+k7mUYxhFfIxqNsnPnTl566aX+FRnD9eijj+K6Lueccw6xWKx/+7Rp06irq+OZZ54Z8INqIBDgwgsvHNa53//+9/OjH/1oRPEcyIE+EVVUVNS/X0REJifdS3NzLxURkcKle+no3EvfeOMN/vVf/5UTTjiBCy64YFSuISIyWaiAIiJDisVixOPxUV3Oe/XVV/P000/z4Q9/mLq6Ok499VTOO+88TjrppEM+trm5Gc/zWLFixZD7fb6B394qKyuHvbR7+vTpTJ8+fVjHHkowGByyL29fX1//fhERmZx0L83NvVRERAqX7qW5v5d2dnayevVqSkpK+O53v4tlWTm/hojIZKICioiMGcdxBvxwNmfOHB555BF+//vf8+STT7JhwwZ+9rOfcc011xxyGbHruhiGwe233z7kD3yhUGjA1yMpVKRSKXp6eoZ17KGG2lZUVNDR0TFoe2dnJ4DeXBIRkREpxHupiIhILhXyvbSnp4err76anp4efvrTn1JZWTnseERECpUKKCIypPLyciKRCK+//vqIHztlyhS6u7sHbd++fTuzZs0asC0UCnHuuedy7rnnkk6n+du//VtuvfVWVq9eTVFR0QGXZNfW1uJ5HjU1NRx11FEjjvFgHn744Zz1mp0/fz7PPPMM8Xh8wCD5F198EYAFCxYcfqAiIjKu6V6qGSgiInJkdC/N3b20r6+PT3/60zQ3N/OjH/2IuXPnHmmIIiIFQQUUERmSaZqceeaZ/PrXv+bll18e1G/W87wD/hA5a9Ysnn32WdLpdP/y5I0bN7Jjx44BP6i+/fbblJWV9X8dCASYM2cOTzzxBJlMhqKiIoqLiwEGffJmxYoVrFmzhptvvpn/+q//GhCL53l0dXUNOPdI5LLX7Nlnn80Pf/hDfv7zn3PVVVcBkE6n+eUvf8nxxx9PVVVVTq4jIiLjj+6lmoEiIiJHRvfS3NxLHcfhs5/9LC+88ALf+973OOGEE3JyXhGRQqACiogc0Oc+9zmeeuoprrjiCi655BLmzJlDZ2cnjzzyCD/72c+IRqNDPu7DH/4w69ev55Of/CTnnHMOra2tPPjgg9TW1g447qqrrmLatGmceOKJTJ06la1bt3LXXXexdOnS/tUaxxxzDADf/va3Offcc/H7/Sxbtoza2lo++9nP8q1vfYv29nbOPPNMwuEwbW1tPPbYY1xyySX9BYuRymWv2eOPP56zzz6bNWvWsHv3burq6rj//vtpb2/nq1/9ak6uISIi45fupblx11130d3dzVtvvQVk3wDbuXMnAFdccQUlJSU5u5aIiIwvupceua9//ev87ne/Y9myZXR1dfHAAw8M2H/++efn5DoiIpORCigickCVlZX84he/4Lvf/S4PPvgg8XicyspKTj/99IP2bj3ttNO4/vrr+dGPfsR//Md/sHDhQm699Vb+8z//c8Bxl156KQ8++CA/+tGPSCaTzJgxgyuuuIK//uu/7j/muOOO47rrruOee+7hySefxHVdfvvb3xIKhfjUpz5FfX09P/7xj7nlllsAmDFjBqeeeirLly8fnSflMHzjG9/gO9/5Dr/+9a/Zs2cPDQ0N3HrrrfzFX/xFvkMTEZFRpntpbvzwhz+kvb29/+sNGzawYcMGAP7yL/9SBRQRkUlM99Ij99prrwHZDyBs3Lhx0H4VUEREDszwPM/LdxAiIiIiIiIiIiIiIiLjiZnvAERERERERERERERERMYbFVBERERERERERERERET2M+FnoLS0tHDHHXfw4osv8vrrrzN79mweeuihQz5u+fLlA/oo7/XSSy9RVFQ0GqGKiIiIiIiIiIiIiMgEMeELKK+//jqPP/44xx9/PK7rMpKRLitXruTKK68csC0QCOQ6RBERERERERERERERmWAmfAFl+fLlnHnmmQBcf/31vPLKK8N+7LRp01i0aNEoRSYiIiIiIiIiIiIiIhPVhJ+BYpoTPgURERERERERERERERlnCrr68OCDD7Jw4UJOOOEErr76apqamvIdkoiIiIiIiIiIiIiIjAMTvoXX4Vq+fDnHHXccM2fOZNu2bdx666185CMf4Ve/+hWzZs067PM6jkt3d28OIx0ZwzC
"text/plain": [
"<Figure size 1630x500 with 3 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris['cluster'] = y_gmm\n",
"sns.lmplot(data=iris, x=\"PCA1\", y=\"PCA2\", hue='species',\n",
" col='cluster', fit_reg=False);"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": [
"solution"
]
},
"source": [
"The GMM has done a reasonably good job of separating the different classes. Setosa is perfectly separated in one cluster, while there remains some mixing between versicolor and viginica."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"exercise_pointer"
]
},
"source": [
"**Exercises:** *You can now complete Exercise 1 in the exercises associated with this lecture.*"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.18"
}
},
"nbformat": 4,
"nbformat_minor": 4
}