add week 2

2025-02-22 19:16:55 +00:00 · 2025-02-22 19:16:55 +00:00 · d81f43e9a6
commit d81f43e9a6
parent 0334922598
6 changed files with 3686 additions and 0 deletions
--- a/week2/exercises/Lecture04_PerformanceAnalysis_Exercises_no_solutions.ipynb
+++ b/week2/exercises/Lecture04_PerformanceAnalysis_Exercises_no_solutions.ipynb
@ -0,0 +1 @@
+{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercises for Lecture 4 (Performance analysis)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import datetime\n", "now = datetime.datetime.now()\n", "print(\"Last executed: \" + now.strftime(\"%Y-%m-%d %H:%M:%S\"))"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Common imports\n", "import os\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "np.random.seed(42) # To make this notebook's output stable across runs"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Fetch MNIST dataset\n", "from sklearn.datasets import fetch_openml\n", "mnist = fetch_openml('mnist_784')\n", "#mnist = fetch_openml('mnist_784', parser=\"pandas\")"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["y_train = mnist.target[:60000].to_numpy(dtype=int)\n", "y_test = mnist.target[-10000:].to_numpy(dtype=int)\n", "y_train.shape, y_test.shape"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["X_train = mnist.data[:60000].to_numpy()\n", "X_test = mnist.data[-10000:].to_numpy()\n", "X_train.shape, X_test.shape"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 1 : Compute number of examples of each digit."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 2: Construct target train and test vectors for 8 classifier."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 3: Use Scikit-Learn to perform 3-fold cross validation using [`cross_val_score`](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercise 4: Compute the confusion matrix"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 5: Compute the precision and recall for the confusion matrix `conf_matrix` computed above.\n", "\n", "Compute by hand and then using Scikit-Learn [precision_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html) and [recall_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html#sklearn.metrics.recall_score)."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 6: Compute the $F_1$ score for the confusion matrix `conf_matrix` computed above.\n", "\n", "Compute by hand and then using Scikit-Learn [f1_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 7: Compute the false positive rate for the confusion matrix `conf_matrix` computed above."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 8: Where is the ideal point in the ROC curve domain?"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}, "tags": ["exercise"]}, "source": ["##  Exercise 9: What is the AUC for an ideal and random classifier?"]}, {"cell_type": "markdown", "metadata": {"tags": ["exercise"]}, "source": ["Consider the confusion matrix for multiclass classification."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["y_train_pred = cross_val_predict(sgd_clf, X_train, y_train, cv=3)\n", "conf_mx = confusion_matrix(y_train, y_train_pred)\n", "conf_mx"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}, "tags": ["exercise"]}, "source": ["## Exercise 10: Convert confusion matrix to probabilities and plot."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["row_sums = conf_mx.sum(axis=1, keepdims=True)\n", "norm_conf_mx = conf_mx / row_sums"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import seaborn as sns\n", "plt.figure(figsize=(6,6))\n", "sns.heatmap(norm_conf_mx, square=True, annot=True, cbar=False, fmt='.2f')\n", "plt.xlabel('predicted value')\n", "plt.ylabel('true value');"]}], "metadata": {"celltoolbar": "Tags", "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15"}}, "nbformat": 4, "nbformat_minor": 4}
--- a/week2/exercises/Lecture05_TrainingI_Exercises_no_solutions.ipynb
+++ b/week2/exercises/Lecture05_TrainingI_Exercises_no_solutions.ipynb
@ -0,0 +1 @@
+{"cells": [{"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "slide"}}, "source": ["# Exercises for Lecture 5 (Training I)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import datetime\n", "now = datetime.datetime.now()\n", "print(\"Last executed: \" + now.strftime(\"%Y-%m-%d %H:%M:%S\"))"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 1: Solving normal equations"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Common imports\n", "import os\n", "import numpy as np\n", "np.random.seed(42) # To make this notebook's output stable across runs\n", "\n", "# To plot pretty figures\n", "%matplotlib inline\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "plt.rcParams['axes.labelsize'] = 14\n", "plt.rcParams['xtick.labelsize'] = 12\n", "plt.rcParams['ytick.labelsize'] = 12"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import numpy as np\n", "m = 100\n", "X = 2 * np.random.rand(m, 1)\n", "y = 4 + 3 * X + np.random.randn(m, 1)\n", "plt.figure(figsize=(9,6))\n", "plt.plot(X, y, \"b.\")\n", "plt.xlabel(\"$x_1$\", fontsize=18)\n", "plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n", "plt.axis([0, 2, 0, 15]);"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["### Solve normal equations."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["### Make predictions using the fitted model for $x_1 = 0$ and $x_1 = 2$. Plot these."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 2: Implement gradient descent\n", "\n", "Implement gradient descent to estimate the parameters of the linear regression model considered before.\n", "\n", "Consider 1000 steps and $\\alpha=0.1$, starting from $\\theta^{(0)}=[1, 1]^{\\rm T}$."]}], "metadata": {"celltoolbar": "Tags", "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5"}}, "nbformat": 4, "nbformat_minor": 4}
--- a/week2/exercises/Lecture06_TrainingII_Exercises_no_solutions.ipynb
+++ b/week2/exercises/Lecture06_TrainingII_Exercises_no_solutions.ipynb
@ -0,0 +1 @@
+{"cells": [{"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "slide"}}, "source": ["# Exercises for Lecture 6 (Training II)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import datetime\n", "now = datetime.datetime.now()\n", "print(\"Last executed: \" + now.strftime(\"%Y-%m-%d %H:%M:%S\"))"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Common imports\n", "import os\n", "import numpy as np\n", "np.random.seed(42) # To make this notebook's output stable across runs\n", "\n", "# To plot pretty figures\n", "%matplotlib inline\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "plt.rcParams['axes.labelsize'] = 14\n", "plt.rcParams['xtick.labelsize'] = 12\n", "plt.rcParams['ytick.labelsize'] = 12"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Set up training data "]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["m = 100\n", "X = 2 * np.random.rand(m, 1)\n", "y = 4 + 3 * X + np.random.randn(m, 1)\n", "plt.figure(figsize=(9,6))\n", "plt.plot(X, y, \"b.\")\n", "plt.xlabel(\"$x_1$\", fontsize=18)\n", "plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n", "plt.axis([0, 2, 0, 15]);"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 1: Solve using Scikit-Learn (without learning schedule)\n", "\n", "Solve the above problem using Scikit-Learn, considering a learning rate of 0.1.  Display the intercept and slope of the fitted line."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 2: Implement a mini-batch gradient descent algorithm to solve previous problem.\n", "\n", "Hints: \n", "  - May want to start with stochastic GD implementation and adapt it. \n", "  - The numpy function [`np.random.permutation`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.permutation.html) may be useful."]}], "metadata": {"celltoolbar": "Tags", "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5"}}, "nbformat": 4, "nbformat_minor": 4}
--- a/week2/slides/Lecture04_PerformanceAnalysis.ipynb
+++ b/week2/slides/Lecture04_PerformanceAnalysis.ipynb
--- a/week2/slides/Lecture05_TrainingI.ipynb
+++ b/week2/slides/Lecture05_TrainingI.ipynb
--- a/week2/slides/Lecture06_TrainingII.ipynb
+++ b/week2/slides/Lecture06_TrainingII.ipynb
				`@ -0,0 +1 @@`
				{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercises for Lecture 4 (Performance analysis)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import datetime\n", "now = datetime.datetime.now()\n", "print(\"Last executed: \" + now.strftime(\"%Y-%m-%d %H:%M:%S\"))"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Common imports\n", "import os\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "np.random.seed(42) # To make this notebook's output stable across runs"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Fetch MNIST dataset\n", "from sklearn.datasets import fetch_openml\n", "mnist = fetch_openml('mnist_784')\n", "#mnist = fetch_openml('mnist_784', parser=\"pandas\")"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["y_train = mnist.target[:60000].to_numpy(dtype=int)\n", "y_test = mnist.target[-10000:].to_numpy(dtype=int)\n", "y_train.shape, y_test.shape"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["X_train = mnist.data[:60000].to_numpy()\n", "X_test = mnist.data[-10000:].to_numpy()\n", "X_train.shape, X_test.shape"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 1 : Compute number of examples of each digit."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 2: Construct target train and test vectors for 8 classifier."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 3: Use Scikit-Learn to perform 3-fold cross validation using [`cross_val_score`](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercise 4: Compute the confusion matrix"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 5: Compute the precision and recall for the confusion matrix `conf_matrix` computed above.\n", "\n", "Compute by hand and then using Scikit-Learn [precision_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html) and [recall_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html#sklearn.metrics.recall_score)."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 6: Compute the $F_1$ score for the confusion matrix `conf_matrix` computed above.\n", "\n", "Compute by hand and then using Scikit-Learn [f1_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 7: Compute the false positive rate for the confusion matrix `conf_matrix` computed above."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 8: Where is the ideal point in the ROC curve domain?"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}, "tags": ["exercise"]}, "source": ["## Exercise 9: What is the AUC for an ideal and random classifier?"]}, {"cell_type": "markdown", "metadata": {"tags": ["exercise"]}, "source": ["Consider the confusion matrix for multiclass classification."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["y_train_pred = cross_val_predict(sgd_clf, X_train, y_train, cv=3)\n", "conf_mx = confusion_matrix(y_train, y_train_pred)\n", "conf_mx"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}, "tags": ["exercise"]}, "source": ["## Exercise 10: Convert confusion matrix to probabilities and plot."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["row_sums = conf_mx.sum(axis=1, keepdims=True)\n", "norm_conf_mx = conf_mx / row_sums"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import seaborn as sns\n", "plt.figure(figsize=(6,6))\n", "sns.heatmap(norm_conf_mx, square=True, annot=True, cbar=False, fmt='.2f')\n", "plt.xlabel('predicted value')\n", "plt.ylabel('true value');"]}], "metadata": {"celltoolbar": "Tags", "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15"}}, "nbformat": 4, "nbformat_minor": 4}