3716 lines
934 KiB
Plaintext
3716 lines
934 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f756c5c5",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"# Lecture 19: Unsupervised learning"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "05e057af",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"\n",
|
||
|
"[Run in colab](https://colab.research.google.com/drive/1N2lOE8YOXNuekxMtflQCB3eM3I6Bx7-w)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"id": "bd0c5278",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:18.896747Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:18.896515Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:18.902967Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:18.902404Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Last executed: 2025-03-07 05:32:18\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"import datetime\n",
|
||
|
"now = datetime.datetime.now()\n",
|
||
|
"print(\"Last executed: \" + now.strftime(\"%Y-%m-%d %H:%M:%S\"))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "fa389c67-03a7-46cc-b849-7963cd86ebc9",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"## Unsupervised learning overview"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "d792ecf7-e0dd-4a17-a021-7cec1e0d5f83",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Labelled data is scarce\n",
|
||
|
"\n",
|
||
|
"The majority of data is unlabeled: we have input features $\\mathbf{X}$, but no labels $\\mathbf{y}$."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "01ba1b5a-00a3-443c-aad2-3b01d0a90b3e",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Imagine that you want to detect defective items on a manufacturing line. You can collect images of the items on the manufacturing line easily and automatically, but if you want to train a classifier to predict whether an item is defective or not, you will need to label every image.\n",
|
||
|
"\n",
|
||
|
"This labelling generally requires human experts to sit down and go through the images by hand, which is time consuming, tedious, and prone to errors. As a result, labelling is often only done on a small subset of the available images.\n",
|
||
|
"\n",
|
||
|
"With unsupervised learning, you do not need labelled data!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "9b67f95a-003a-4d7e-9138-bff388491356",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Types of unsupervised learning\n",
|
||
|
"\n",
|
||
|
"We will cover three types of unsupervised learning.\n",
|
||
|
"\n",
|
||
|
"1. *Clustering*: Group similar data instances into clusters.\n",
|
||
|
"2. *Density estimation*: Estimate the probability distribution of data (for use as a generative model to generate new data instances or to estimate probability densities).\n",
|
||
|
"3. *Anomaly detection*: Detect anomalous data."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f1aedd59-5c76-4866-ae55-0745009652fe",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"## Clustering with K-means"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "d879db6b",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Clustering overview\n",
|
||
|
"\n",
|
||
|
"*Clustering* is the task of identifying similar instances and assigning them to *clusters*, or groups of similar instances.\n",
|
||
|
"\n",
|
||
|
"Consider the Iris dataset, which classifies each instance into one of three classes:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"id": "43b2f3d8",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:18.905064Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:18.904894Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:19.276111Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:19.275469Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import numpy as np\n",
|
||
|
"%matplotlib inline\n",
|
||
|
"import matplotlib\n",
|
||
|
"import matplotlib.pyplot as plt"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"id": "a4916a09",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:19.278360Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:19.278122Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.063169Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.062447Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array(['setosa', 'versicolor', 'virginica'], dtype='<U10')"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"from sklearn.datasets import load_iris\n",
|
||
|
"\n",
|
||
|
"data = load_iris()\n",
|
||
|
"X = data.data\n",
|
||
|
"y = data.target\n",
|
||
|
"data.target_names"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "b7e3dedd",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Let's plot the petal width against the petal length, both with and without different markers for the classes:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"id": "010507ab",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.065475Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.065147Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.220141Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.219437Z"
|
||
|
},
|
||
|
"scrolled": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAwUAAAFRCAYAAAAl5QGxAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAbrpJREFUeJzt3XlclNX+B/DPzCigyCICioIjivu+4AK54YKYpLfNbBGU0lxKr9W9aZlat7BFM61MjMTqds1KLfcV3E1w+ym5oSKg4C64gs48vz9oRgZmmGf2h5nP+77mdePMOef5PhMd5jznec5XJgiCACIiIiIicllyRwdARERERESOxUkBEREREZGL46SAiIiIiMjFcVJAREREROTiOCkgIiIiInJxnBQQEREREbk4TgqIiIiIiFxcNUcHIAVqtRoXL16El5cXZDKZo8MhIrIpQRBw69Yt1K9fH3K5610b4phPRK5E7JjPSQGAixcvIiQkxNFhEBHZVW5uLoKDgx0dht1xzCciV2RszOekAICXlxeA0g/L29vbwdEQEdlWUVERQkJCtGOfq+GYT0SuROyYz0kBoF0+9vb25h8IInIZrnrrDMd8InJFxsZ817uZlIiIiIiIdHBSQERERETk4jgpICIiIiJycZwUEBERERG5OE4KiIiIiIhcHHcfMoNKpcKDBw8cHQZRBdWrV4dCoXB0GA615ewWvL7+dcyPmY/+jftb3NaS/oiIiKoKyU0KEhMTsWLFCpw4cQI1atRAREQEPv74YzRv3txgm5SUFIwaNUqnzN3dHffv37dqbIIgoKCgADdv3rRqv0TW5Ovri3r16rnkdpOCIGDa1mk4fvU4pm2dhn6h/UR/DvraAjC7PyIisr28vDycPn0aTZs2NSkZo7525vblLCQ3Kdi+fTsmTJiA8PBwPHz4ENOmTcPAgQPx119/wdPT02A7b29vnDx5UvuzLf5wayYEgYGBqFmzJr8ckKQIgoC7d+/i8uXLAICgoCAHR2R/m85sQvrFdABA+sV0bDqzCdFh0Wa31fyzOf0REZFtJScnY8yYMVCr1ZDL5UhKSkJCQoJZ7QCY1ZczkQmCIDg6iMpcuXIFgYGB2L59O3r16qW3TkpKCiZPniz6Cn5xcTGKi4u1P2syvRUWFhpMZKNSqXDq1CkEBgaiTp06Jp8Hkb1cu3YNly9fRrNmzVzqViJBENDt2244mH8QKkEFhUyBTkGd8OfLfxqdwOtr27FeR0AGHMo/ZHJ/UldUVAQfH59KxzxnYs6YT0TSlpeXB6VSCbVarS1TKBTIzs6u9Cq/vnZyeekjtqb2VVWIHfMl/6BxYWEhAMDPz6/Serdv34ZSqURISAiGDh2KzMxMg3UTExPh4+OjfYWEhBiNQ/MMQc2aNU2Insj+NL+jrvbci+ZKv0pQAQBUgkrnir+pbTPyM5BxMcOs/khazBnziUjaTp8+rfMlHii9gJuVlWVyO7VabVZfzkbSkwK1Wo3JkycjMjISbdq0MVivefPm+O677/D777/jxx9/hFqtRkREBPLy8vTWnzp1KgoLC7Wv3Nxc0TFV9SuE5Pxc8XdUEARMT50OhUx3ZUQhU2B66nRUtiBqqK0+Yvoj6bFkzCciaWratKn2Cr+GQqFAWFiYye3kcrlZfTkbSU8KJkyYgGPHjmHZsmWV1uvRowdGjhyJDh06oHfv3lixYgUCAgKwaNEivfXd3d3h7e2t8yKiqqv8lX4NMVf3DbXVh6sFVRPHfCLnExwcjKSkJO1tsgqFAosWLTJ6u4++dklJSWb15Wwk96CxxsSJE7FmzRrs2LHD5H8p1atXR8eOHV1u2YfIFWmu9MshhxrqCu/LIcf01OkY2GRghVUUY231qaw/IiKyn4SEBERHRyMrKwthYWGivy8aamdOX85EcpMCQRDw2muvYeXKlUhLS0NoaKjJfahUKhw9ehSDBw+2QYQUHx+PtLQ0ZGdnOzoUIpSoSpBTmGPwS70aauQW5aJEVQL3au4mtTW1PyIisq/g4GCzvsDra2duX85CcpOCCRMm4KeffsLvv/8OLy8vFBQUAAB8fHxQo0YNAMDIkSPRoEEDJCYmAgDef/99dO/eHWFhYbh58yY+/fRTnD9/Hi+//LLDzqMq0eR5SE9PR5cuXRwdjlnu3r2LTz75BH369EGfPn0cHQ7ZkXs1d6S/ko4rd68YrBPoGaj3C7yhtgW3C3Dj/g0AgJ+HH+rWqiuqPyIioqpKcpOChQsXAkCFL3ZLlixBfHw8ACAnJ0fngZAbN27glVdeQUFBAWrXro3OnTtjz549aNWqlb3CNpsgqHDz5k6UlOTDzS0Ivr49IRPxwKMjLV68uMJT+o529+5dzJo1C0DF3x1yfiE+IQjxMW9HGUvaEhEROQvJTQrE7OqRlpam8/Pnn3+Ozz//3EYR2c6VKyuQlTUJxcWPdklydw9GWNgXCAh40oGR6Xfnzh14enqievXqjg6FqogtZ7fg9fWvY37MfPRv3N/q/X+y+xNMT52OD/p+gH9F/svq/RMREbkKSe8+5MyuXFmBzMyndSYEAFBcfAGZmU/jypUVDoqsVHx8PGrVqoUzZ85g8ODB8PLywgsvvKB9r1GjRjr1ly1bhs6dO8PLywve3t5o27YtvvjiC6PHEdPu5s2bmDx5MkJCQuDu7o6wsDB8/PHH2tWK7OxsBAQEAABmzZoFmUwGmUyGmTNnavvYtm0bevbsCU9PT/j6+mLo0KE4fvy4znFu3bqFyZMno1GjRnB3d0dgYCAGDBiAgwcPauvs3LkTzzzzDBo2bAh3d3eEhITgn//8J+7duyf6s3UVgiBg2tZpOH71OKZtnWb1bTzVajVmbZ+FElUJZm2fJbnVKyIiV5KXl4fU1FSD28FbQ3p6OubOnYv09HSbHcOVSW6lwBUIggpZWZMA6PuSJACQIStrMvz9hzr0VqKHDx8iOjoajz32GD777DODids2b96MESNGoF+/fvj4448BAMePH8fu3bsxadIkg/2LaXf37l307t0bFy5cwNixY9GwYUPs2bMHU6dORX5+PubNm4eAgAAsXLgQ48aNwz/+8Q88+WTpKku7du0AAFu2bEFMTAwaN26MmTNn4t69e1iwYAEiIyNx8OBB7QTn1Vdfxa+//oqJEyeiVatWuHbtGnbt2oXjx4+jU6dOAIBffvkFd+/exbhx41CnTh3s378fCxYsQF5eHn755RfLP3QnotnqE4B2G8/osGir9Z+4KxF3H9wFANx9cBeJuxLxTq93rNY/ERGJk5ycjDFjxkCtVkMulyMpKQkJCQlWPUZ8fDyWLl2q/TkuLg4pKSlWPYbLE0goLCwUAAiFhYUG69y7d0/466+/hHv37ll8vOvXU4XUVBh9Xb+eavGxxFiyZIkAQEhPT9eWxcXFCQCEt99+u0L9uLg4QalUan+eNGmS4O3tLTx8+NCk44pp98EHHwienp7CqVOndMrffvttQaFQCDk5OYIgCMKVK1cEAMKMGTMq9NGhQwchMDBQuHbtmrbsyJEjglwuF0aOHKkt8/HxESZMmFBpzHfv3q1QlpiYKMhkMuH8+fOVtrUXa/6umkutVgvhSeGCYpZCwEwIilkKITwpXFCr1VbpX6VSCTU/rClgJrSvmh/WFFQqlVX6d3Zixjxn5urnT2RNubm5glwuF1B6VVMAICgUCiE3N9dqx9i/f79O/5rX/v37rXYMZyZ2zOPtQw5QUpJv1Xq2NG7cOKN1fH19cefOHWzevNmkvsW0++WXX9CzZ0/Url0bV69e1b769+8PlUqFHTt2VHqM/Px8HD58GPHx8fDz89OWt2vXDgMGDMC6det04vnzzz9x8eJFg/1pdsACSp+xuHr1KiIiIiAIAg4dOiTmtF1C+YRg1k76VXaVQEOzWkBERPZz+vTpCrdvqlQqq+aK2rlzp97y3bt3W+0YxGcKHMLNLciq9WylWrVqovbrHT9+PJo1a4aYmBgEBwdj9OjR2LBhg1XanT59Ghs2bEBAQIDOq3//0odWL1++XOkxzp8/DwBo3rx5hfdatmyJq1ev4s6dOwCATz75BMeOHUNISAi
|
||
|
"text/plain": [
|
||
|
"<Figure size 900x350 with 2 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(9, 3.5))\n",
|
||
|
"\n",
|
||
|
"plt.subplot(121)\n",
|
||
|
"plt.plot(X[y==0, 2], X[y==0, 3], \"yo\", label=\"Iris setosa\")\n",
|
||
|
"plt.plot(X[y==1, 2], X[y==1, 3], \"bs\", label=\"Iris versicolor\")\n",
|
||
|
"plt.plot(X[y==2, 2], X[y==2, 3], \"g^\", label=\"Iris virginica\")\n",
|
||
|
"plt.xlabel(\"Petal length\", fontsize=14)\n",
|
||
|
"plt.ylabel(\"Petal width\", fontsize=14)\n",
|
||
|
"plt.legend(fontsize=12)\n",
|
||
|
"\n",
|
||
|
"plt.subplot(122)\n",
|
||
|
"plt.scatter(X[:, 2], X[:, 3], c=\"k\", marker=\".\")\n",
|
||
|
"plt.xlabel(\"Petal length\", fontsize=14)\n",
|
||
|
"plt.tick_params(labelleft=False)\n",
|
||
|
"\n",
|
||
|
"plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "c34d799b",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"When the instances are plotted without different markers to differentiate the classes, as in the figure on the right, it is not obvious that the upper-right cluster is composed of two distinct sub-clusters."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ade9d090",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"There is no universal definition of what a cluster is. It depends on the context: different algorithms will capture different kinds of clusters.\n",
|
||
|
"\n",
|
||
|
"Some algorithms look for instances centered around a particular point, called a *centroid*. Others look for continuous regions of densely packed instances."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "dddd23bc-a156-4ae1-bd94-5da2129779d7",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### K-means clustering overview"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "3c73d2a5-ec56-4ea6-a496-10f66084984f",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Cluster data into different subsets. \n",
|
||
|
"\n",
|
||
|
"First, let's consider how K-means represents clusters, then we'll consider the algorithm to find clusters.\n",
|
||
|
"\n",
|
||
|
"- Each cluster is defined by a centroid.\n",
|
||
|
"- Each data instance is allocated a cluster (equivalently, associated with a centroid)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "258dfb21",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Let's generate some blobs to use as a dataset, and run the K-Means algorithm for 1, 2, and 3 iterations to see how the centroids move around:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"id": "52682b37-a58a-4647-9b52-2ec50a4ed106",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.222313Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.222127Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.308963Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.308289Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.cluster import KMeans\n",
|
||
|
"from sklearn.datasets import make_blobs\n",
|
||
|
"blob_centers = np.array([[ 0.2, 2.3], [-1.5 , 2.3], [-2.8, 1.8],\n",
|
||
|
" [-2.8, 2.8], [-2.8, 1.3]])\n",
|
||
|
"blob_std = np.array([0.4, 0.3, 0.1, 0.1, 0.1])\n",
|
||
|
"X, y = make_blobs(n_samples=2000, centers=blob_centers, cluster_std=blob_std,\n",
|
||
|
" random_state=7)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "0d70af36-80ec-460d-b2ef-6f47b656dde7",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Now let's plot them:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"id": "d6ea16db-a3a4-4c10-bc45-c4e312ce118f",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.311254Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.311003Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.452710Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.451956Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAq0AAAFzCAYAAAAUgDBpAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAcLJJREFUeJzt3X98VPWdL/7XZDIkYwr5Ua2YiGOklaIYwLQYxMZoEdSswr2P2t5ws2q3u/WydWnXpbtiy4bU9cd9bHrbYiurt92Lt7nwaO1+C7ZRA4gxYs1SIz/E1lgpBh6JsbQkExsnw2Rmvn9MPoczk3Nmzpk5P2dez8eDBzBz5pzP+fX5vM/nfH544vF4HEREREREDlZkdwKIiIiIiDJh0EpEREREjseglYiIiIgcj0ErERERETkeg1YiIiIicjwGrURERETkeAxaiYiIiMjxiu1OgFlisRiGh4cxe/ZseDweu5NDRERERCni8Tg++OADVFdXo6gofV1q3gatw8PDmDdvnt3JICIiIqIMTp06hYsvvjjtMnkbtM6ePRtA4iDMmTPH5tToF4lEsGfPHqxatQo+n8/u5JAFeM4LD895YeJ5Lzw85+rGx8cxb948KW5LJ2+DVtEkYM6cOa4NWs877zzMmTOHF3iB4DkvPDznhYnnvfDwnGempSknO2IRERERkeMxaCUiIiIix2PQSkRERESOx6CViIiIiByPQSsREREROR6DViIiIiJyPAatREREROR4DFqJiIiIyPEYtBIRERGR4zFoJSIiIiLHY9BKREREqjr7BrHi0f3o7Bu0OylU4Bi0EhERkaptPccxNBbCtp7jdieFChyDViIiIlK1vmk+air8WN803+6kUIErtjsBRERE5FytDQG0NgTsTgYRa1qJiIjI/dj2Nv8xaCUiIiLXY9vb/MeglYiIiFyPbW/zH9u0EhERkeux7W3+Y00rAWBbICIiSo/lBNmNQSsBYFsgIiJKj+UE2Y1BKwFgWyAi0o81b4WF5QTZjW1aCQDbAhGRfvKaN+YfuensG8S2nuNY3zTfsceS5QTZjTWtRESUFda8GYev3okyY00rERFlhTVvxlnfNF+qaSUiZQxaiYiIbMYHADKTG5qfaMHmAUR5gB1iiIhITb40P2HQSpQH8iVDIvfigxORc+VL+3MGrZQRCyPny5cMidyLD05EztXaEMAr99/o6qYBAIPWvGVkoMnCyPnyJUMi9+KDExGZjUFrnjIy0GRhRESZ8MGJiMzGoDUPKNWqGhlosjAiIiIiuzFozQNKtapWBJps60pERERWsSRo3bZtG+rq6jBnzhzMmTMHy5cvx3PPPZf2N08//TQ++clPorS0FFdddRWeffZZK5LqSna9vmdbVyIiIrKKJUHrxRdfjEcffRT9/f147bXXcOONN2LNmjV48803FZf/1a9+hZaWFnzpS1/CoUOHsHbtWqxduxbHjh2zIrmuIWo6Adjy+p5tXYmIiMgqlgStt912G2699VZ84hOfwOWXX46HHnoIH/nIR9DX16e4/Pe+9z3cfPPN+PrXv46FCxfiwQcfxNVXX43vf//7ViTXNeyu6WRbVyIid2GzLnIzy6dxjUajePrppzExMYHly5crLvPqq6/ivvvuS/ps9erV2LVrl+p6w+EwwuGw9P/x8XEAQCQSQSQSyT3hFhNpTpf2L3/mUjzRewJf/sylrtxHSqblnJM+Ow6ewhO9J3BPYy3WLZtnd3Jm4DnPT5muOzvP++MvvoPh4CQef/EdfKG+2vLtFyre6+r0HBNPPB6Pm5gWyRtvvIHly5djcnISH/nIR7Bjxw7ceuutisvOmjULTz31FFpaWqTPHn/8cbS3t+P9999X/M2WLVvQ3t4+4/MdO3bgvPPOM2YnTHRgxIN9Q0VYWRPDdXOzPyWZ1mPUdojcYEu/F6NnPaicFceW+qjdyaEC4eTrTl4GAGB5QLb78MMPsW7dOgSDQcyZMyftspbVtC5YsACHDx9GMBjEz372M9x111146aWXcMUVVxiy/k2bNiXVzo6Pj2PevHlYtWpVxoPgBP+zoxejZyfxypkyPPxXjYhEIti7dy9uuukm+Hw+zev554f3I3h2CntGSvDwX90444k/dTtqsq2hcnrNlpNle85J3dj5567HWx14PfKc56dM152d5/1WAA9P//t6jeWBk7i1jOG9rk68GdfCsqB11qxZ+PjHPw4AqK+vx69//Wt873vfwxNPPDFj2blz586oUX3//fcxd+5c1fWXlJSgpKRkxuc+n88VF8jf3vBxbOs5jvVN85PSqzf9Hnikv30+H558+V0MByfxrV/+Fl6vV9pOfaASTd9+Geub5iu2SRW/e/Lld3HXiss0bz/b39E5brlm3eCuFZe54jrkOc8vWq87o857Z9+gVH7o6WOgVu44mdvLGN7rM+k5HraN0xqLxZLaoMotX74cL7zwQtJne/fuVW0Dmw+0dmpSakQvPtuw8xAAoMLvw8bVCwAkevh7PUA0nui4JbbTPzgqdeJSWmd9oBJeT+JvPTiiAFmBnUkonUK7PrLtlOvGzrQsYwqbJUHrpk2b0Nvbi3fffRdvvPEGNm3ahJ6eHvz3//7fAQB33nknNm3aJC3/1a9+Fc8//zy+/e1v46233sKWLVvw2muv4d5777UiuY6mlDmJz7qODmMsFEFZSTFaGwLS03dzXXXSTd7ZN4iJ8BQq/D6sb5qvuM7+wVFE44m/lagVCm7MBMl97B45g5wtX64PrcF3IQVyLGMKmyVB6x/+8AfceeedWLBgAT772c/i17/+Nbq7u3HTTTcBAE6ePIn33ntPWv7aa6/Fjh078OSTT2Lx4sX42c9+hl27dmHRokVWJNfRlDIn8VlqcCoy7v7B0aSbvKN7AGOhRG+91oaA4joz1bTmS6FQyNxcG1VIhTTply/Xh9Z8loEcFQpL2rT+6Ec/Svt9T0/PjM/uuOMO3HHHHSalyL1aGwIzMialzwBItahqGXd4KooVj+5PCkxF7exEeArROND79mmseHT/jLZSmdZNzicvEN1W2Kld80RA/lwfzGeJklk+TivZb+PqBVJgOjQWwkgwJLV5BYChsRA8SLSNFf9PDWzypVBwq2w7XsixQCRyNuazRMls64hF+mV6nZv6feqrpdRpXzeuXoCaCj+urC6XmgKIjltxJGpig6EI/L4iBjYOY0TzDL5SJLJeZ98gru/oxYERj91JIXIdBq0ukilQSf0+tV2X/HtRU1cfqMSbw0FE48Avjgyjo3tAahsbisQQBzAZiSkGNm5uE2kXo45ZvrTZIyo023qOYzg4iX1DLH6J9OJd42AHRjy4vqNXCnAyBSqp36fWpMm/FwHsL44MIzo9EUocwFgogt63Tydto9Tnlf4tD7rYGUs/o44Za0mtx4c0MsL6pvmoLi+VZqQiIu0YtDrYvqEiDAcnFQMcPQVoarMA+YgBpb7EJeD3FUH+skps0+sBvtG8MOlzEXStb5qPCr8PZybOYkn7HhbmGrCGVJ3Tg0I+pJERWhsCeGljI6dNJcoCO2I52MqaGF45U6b4en8iPIWxUAQPdf0WHd0D0m/GQhGpUBWBZbpe4jddMRf9g6NJ21jfNB8HT5zBSDCE5rpqxVED6gOV6OgeQDAUQRxAKBJFR/dAzp2D8h07Vqhz+mgG7LhGdjKi8yWR27Gm1cGumxvHSxsbFV/vC5ORKMZCEemPB4kOVSIAaNt9DPWByhm/6+gewNBYCL1vn0Z9oBJtu4/h4IkzUk2sfHKBzr5BLGnfg4Wbn0NH9wDWN81H/+AoxqYDVgDwINFxizVRlC2n10KzSUb+c3JtP2v6iRi0uoq80BQ9/29bXA2/79xpjCPRoeoPH0wCSEzf+twb72EkGMLBE2cU19t1NNGutevosJRpf7RsljSiwLae4xgLRRCKxDAWiqCjewAT4Sn4fUXw+7zwTG+3pNjr6KCDnM2KoNDJQQmZR+t5d3JgaOVDHe8TcioGrQ52YMSDTz28X7G9qCjgt7YsRVVZCYBEbacIICPRc+2lIrF4UlC6pH0PwlNRVPh92Lh6AZrrquH1AM1
|
||
|
"text/plain": [
|
||
|
"<Figure size 800x400 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"def plot_clusters(X, y=None):\n",
|
||
|
" plt.scatter(X[:, 0], X[:, 1], c=y, s=1)\n",
|
||
|
" plt.xlabel(\"$x_1$\")\n",
|
||
|
" plt.ylabel(\"$x_2$\", rotation=0)\n",
|
||
|
"\n",
|
||
|
"plt.figure(figsize=(8, 4))\n",
|
||
|
"plot_clusters(X)\n",
|
||
|
"plt.gca().set_axisbelow(True)\n",
|
||
|
"plt.grid()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "d8cb1420-daa7-4e6c-8439-780eb9ee4b20",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Cluster with Scikit-Learn K-means"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "4b3bb454-6d2a-4d6c-823f-a6481ace87a8",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Must set number of clusters to consider."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 7,
|
||
|
"id": "9b1a11fe-7599-469f-ba66-8aec5bd5e464",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.455045Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.454858Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.516904Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.516260Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"k = 5\n",
|
||
|
"kmeans = KMeans(n_clusters=k, n_init=10, random_state=42)\n",
|
||
|
"y_pred = kmeans.fit_predict(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "263eeb26-1ef1-408b-828e-027b12b5e65a",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Each instance assigned to one of the five clusters:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"id": "473e5159-d2e4-460f-9369-8a4ddd660d64",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.520073Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.519823Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.526424Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.525803Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([0, 0, 4, ..., 3, 1, 0], dtype=int32)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"y_pred"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f4ea4e83-42f4-438f-b05d-32d86e099b28",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"A *centroid* is estimated for each cluster (i.e., cluster centers). "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 9,
|
||
|
"id": "f166dd90-cb2a-4344-83d9-24428a788a16",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.528872Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.528668Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.535050Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.534458Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[-2.80214068, 1.55162671],\n",
|
||
|
" [ 0.08703534, 2.58438091],\n",
|
||
|
" [-1.46869323, 2.28214236],\n",
|
||
|
" [-2.79290307, 2.79641063],\n",
|
||
|
" [ 0.31332823, 1.96822352]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 9,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"kmeans.cluster_centers_"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 10,
|
||
|
"id": "403c4770-d2c2-489c-a399-ebf9d88b8f63",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.538443Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.537552Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.543515Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.542928Z"
|
||
|
},
|
||
|
"scrolled": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def plot_centroids(centroids, weights=None, circle_color='w', cross_color='k'):\n",
|
||
|
" if weights is not None:\n",
|
||
|
" centroids = centroids[weights > weights.max() / 10]\n",
|
||
|
" plt.scatter(centroids[:, 0], centroids[:, 1],\n",
|
||
|
" marker='o', s=35, linewidths=8,\n",
|
||
|
" color=circle_color, zorder=10, alpha=0.9)\n",
|
||
|
" plt.scatter(centroids[:, 0], centroids[:, 1],\n",
|
||
|
" marker='x', s=2, linewidths=12,\n",
|
||
|
" color=cross_color, zorder=11, alpha=1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 11,
|
||
|
"id": "cc0d3c9d-01ba-4bf4-a619-b6aaa58ce4c1",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.546972Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.546031Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.698656Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.697966Z"
|
||
|
},
|
||
|
"scrolled": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAq0AAAFzCAYAAAAUgDBpAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAeYZJREFUeJzt3X98FNW9P/7XZrMkSwrZRFRMxIixUpDyw7QIYjEqgpoq3H5Le+FStT9uvbSWthbbBIUQEJLWWFu0cvW292KbC59W7y3YRgUUY8SaUiOIqMQSIXATY0GSDYbNZrO73z+WM8xuZndnd2dnZ3Zfz8fDB2Z3dubMnJkz7zlzflj8fr8fREREREQGlpXqBBARERERRcOglYiIiIgMj0ErERERERkeg1YiIiIiMjwGrURERERkeAxaiYiIiMjwGLQSERERkeFlpzoByeLz+dDV1YVRo0bBYrGkOjlEREREFMLv9+P06dMoKipCVlbkutS0DVq7urowbty4VCeDiIiIiKI4fvw4Lr744ojLpG3QOmrUKACBgzB69OgUpyZ2Ho8HO3fuxLx582Cz2VKdHNIB8zzzMM8zE/M98zDPw+vr68O4ceOkuC2StA1aRZOA0aNHmzZoHTlyJEaPHs0TPEMwzzMP8zwzMd8zD/M8OjVNOdkRi4iIiIgMj0ErERERERkeg1YiIiIiMjwGrURERERkeAxaiYiIiMjwGLQSERERkeExaCUiIiIiw2PQSkRERESGx6CViIiIiAyPQSsRERERGR6DViIiIgqroaUDs+t2o6GlI9VJoQzHoJWIiIjC2tTUjs5eFzY1tac6KZThGLQSERFRWMvKS1HssGNZeWmqk0IZLjvVCSAiIiLjWjqzBEtnlqQ6GUSsaSUiIiLzY9vb9MeglYiIiEyPbW/TH4NWIiIiMj22vU1/bNNKREREpse2t+mPNa0EgG2BiIgoMt4nKNUYtBIAtgUiIqLIeJ+gVGPQSgDYFoiIYseat8zC+wSlGtu0EgC2BSKi2Mlr3lh+JKahpQObmtqxrLzUsMeS9wlKNda0EhFRXFjzph2+eieKjjWtREQUF9a8aWdZealU00pEyhi0EhERpRgfACiZzND8RA02DyBKA+wQQ0RE4aRL8xMGrURpIF0KJDIvPjgRGVe6tD9n0EpR8WZkfOlSIJF58cGJyLiWzizBa5U3mLppAMCgNW1pGWjyZmR86VIgkXnxwYmIko1Ba5rSMtDkzYiIouGDExElG4PWNKBUq6ploMmbEREREaUag9Y0oFSrqkegybauREREpBddgtZNmzZhypQpGD16NEaPHo1Zs2bh+eefj/ibp59+Gp/5zGeQm5uLz372s3juuef0SKopper1Pdu6EhERkV50CVovvvhi1NXVobW1FW+88QZuuOEGLFiwAO+8847i8n/5y1+wePFifPOb38S+ffuwcOFCLFy4EAcPHtQjuaYhajoBpOT1Pdu6EhERkV50CVpvu+023Hrrrfj0pz+NK664AuvXr8enPvUptLS0KC7/y1/+EjfffDPuu+8+TJw4EevWrcNVV12Fxx57TI/kmkaqazrZ1pWIyFzYrIvMTPdpXL1eL55++mn09/dj1qxZisu8/vrruPfee4M+mz9/PrZt2xZ2vW63G263W/q7r68PAODxeODxeBJPuM5EmiOl/dtfuBRPNB/Bt79wqSn3kYKpyXOKzZa9x/FE8xHcPWc8lswYl+rkDMM8T0/RzrtU5vvjLx9Gl3MAj798GF8tK9J9+5mK13p4sRwTi9/v9ycxLZK3334bs2bNwsDAAD71qU9hy5YtuPXWWxWXHTFiBJ566iksXrxY+uzxxx9HTU0NPvroI8XfrFmzBjU1NcM+37JlC0aOHKnNTiTRnm4LXuzMwtxiH64dG3+WRFuPVtshMoM1rVb0DFpQMMKPNWXeVCeHMoSRzzv5PQAA7weUcmfOnMGSJUvgdDoxevToiMvqVtM6YcIE7N+/H06nE8888wzuvPNOvPLKK5g0aZIm66+qqgqqne3r68O4ceMwb968qAfBCH5a34yewQG8dioPG74xBx6PB7t27cJNN90Em82mej2rN+yGc3AIO7tzsOEbNwx74g/dTjjx1lAZvWbLyOLNcwqvd8y58/FWA56PzPP0FO28S2W+3wpgw9n/v07l/cBIzHqP4bUenngzroZuQeuIESNw+eWXAwDKysrwt7/9Db/85S/xxBNPDFt27Nixw2pUP/roI4wdOzbs+nNycpCTkzPsc5vNZooT5DvXX45NTe1YVl4alN5Y02+BRfrXZrPhyVePoss5gLV/fg9Wq1XaTllJAcoffhXLyksV26SK3z356lHcOfsy1duP93d0jlnOWTO4c/ZlpjgPmefpRe15p1W+N7R0SPePWPoYhLvvGJnZ7zG81oeL5XikbJxWn88X1AZVbtasWXjppZeCPtu1a1fYNrDpQG2nJqVG9OKz5Vv3AQAcdhtWzJ8AINDD32oBvP5Axy2xndaOHqkTl9I6y0oKYLUE/o0FRxQgPbAzCUWSaedHvJ1yzdiZlveYzKZL0FpVVYXm5mYcPXoUb7/9NqqqqtDU1IR/+Zd/AQDccccdqKqqkpb//ve/jxdeeAEPP/wwDh06hDVr1uCNN97APffco0dyDU2pcBKfNR7oQq/Lg7ycbCydWSI9fVdMKQq6yBtaOtDvHoLDbsOy8lLFdbZ29MDrD/yrJNxNwYyFIJlPqkfOIGNLl/NDbfCdSYEc7zGZTZeg9R//+AfuuOMOTJgwATfeeCP+9re/YceOHbjpppsAAMeOHcOHH34oLX/NNddgy5YtePLJJzF16lQ888wz2LZtGyZPnqxHcg1NqXASn4UGp6Lgbu3oCbrI63e0odcV6K23dGaJ4jqj1bSmy00hk5m5NiqTbtIUu3Q5P9SWswzkKFPo0qb1N7/5TcTvm5qahn22aNEiLFq0KEkpMq+lM0uGFUxKnwGQalHDFdzuIS9m1+0OCkxF7Wy/ewheP9D8/gnMrts9rK1UtHWT8clviGa72YU754mA9Dk/WM4SBdN9nFZKvRXzJ0iBaWevC91Ol9TmFQA6e12wINA2VvwdGtiky03BrOLteCHHGyKRsbGcJQqWso5YFLtor3NDvw99tRQ67euK+RNQ7LDjyqJ8qSmA6LjlR6Am1unywG7LYmBjMFo0z+ArRSL9NbR04Lr6ZuzptqQ6KUSmw6DVRKIFKqHfh7brkn8vaurKSgrwTpcTXj/wp7e6UL+jTWob6/L44Acw4PEpBjZmbhOZKlods3Rps0eUaTY1taPLOYAXO3n7JYoVrxoD29NtwXX1zVKAEy1QCf0+tCZN/r0IYP/0Vhe8ZydC8QPodXnQ/P6JoG3k2qzS/8uDLnbGip1Wx4y1pPrjQxppYVl5KYryc6UZqYhIPQatBvZiZxa6nAOKAU4sN9DQZgHyEQNybYFTwG7Lgvxlldim1QLcXzEx6HMRdC0rL4XDbsOp/kFMq9nJm7kKrCENz+hBIR/SSAtLZ5bglRVzOG0qURzYEcvA5hb78NqpPMXX+/3uIfS6PFjf+B7qd7RJv+l1eaSbqggsI/USv2nSWLR29ARtY1l5KfYeOYVupwsVU4oURw0oKylA/Y42OF0e+AG4PF7U72hLuHNQumPHivCMPpoBO65RKmnR+ZLI7FjTamDXjvXjlRVzFF/vCwMeL3pdHuk/CwIdqkQAUL39IMpKCob9rn5HGzp7XWh+/wTKSgpQvf0g9h45JdXEyicXaGjpwLSanZi46nnU72jDsvJStHb0oPdswAoAFgQ6brEmiuJl9FpoNslIf0au7WdNPxGDVlOR3zRFz//bphbBbjuXjX4EOlT94/QAgMD0rc+//SG6nS7sPXJKcb2NBwLtWhsPdEmF9nl5I6QRBTY1taPX5YHL40Ovy4P6HW3odw/BbsuC3WaF5ex2c7Kthg46yNj0CAqNHJRQ8qjNdyMHhno+1PE6IaNi0Gpge7ot+NyG3YrtRcUNfuPi6SjMywEQqO0UAaTHe669lMfnDwpKp9XshHvIC4fdhhXzJ6BiShGsFqBiSpFUaIs
|
||
|
"text/plain": [
|
||
|
"<Figure size 800x400 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(8, 4))\n",
|
||
|
"plot_clusters(X)\n",
|
||
|
"plot_centroids(kmeans.cluster_centers_)\n",
|
||
|
"plt.gca().set_axisbelow(True)\n",
|
||
|
"plt.grid()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "85d52600-bd88-42a3-ab87-534bd5aaf799",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Predictions\n",
|
||
|
"\n",
|
||
|
"Can make predictions for new instances:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 12,
|
||
|
"id": "4aaa517b-6e61-4420-a67a-eaa6f5b5e4a3",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.700640Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.700446Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.703441Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.702851Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import numpy as np"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"id": "7dd1bbaa-2397-4bd2-b498-1015bbb12d51",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.705280Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.705104Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.710360Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.709768Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([4, 4, 3, 3], dtype=int32)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"X_new = np.array([[0, 2], [3, 2], [-3, 3], [-3, 2.5]])\n",
|
||
|
"kmeans.predict(X_new)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "475155e2-673a-4621-85de-c75110cba1eb",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Decision boundaries"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "e13f0010-97f7-4f01-9587-849b86d200ff",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Let's plot the model's decision boundaries. This gives us a _Voronoi diagram_:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 14,
|
||
|
"id": "9dd5caf8-5690-4205-bec6-273d580d4bdf",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.712259Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.712086Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:20.996132Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:20.995470Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAq0AAAFzCAYAAAAUgDBpAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAqBJJREFUeJzsnXucTPX/x58zs/dde8GyLptLCiXXJJekIomiWpWKNl1cK102LFGyizapX0hFbalUKIokFcndkjaivnJNtrB21953Z+b3xzinmdmZ3bmfM7Of5+Oxj+zMOefzPmc3XvP6vC8ao9FoRCAQCAQCgUAgUDFapQMQCAQCgUAgEAhqQohWgUAgEAgEAoHqEaJVIBAIBAKBQKB6hGgVCAQCgUAgEKgeIVoFAoFAIBAIBKpHiFaBQCAQCAQCgeoRolUgEAgEAoFAoHqClA7AWxgMBv7++2/q1KmDRqNROhyBQCAQCAQCgRVGo5ELFy7QuHFjtNrqvdSAFa1///03iYmJSochEAgEAoFAIKiBkydP0rRp02qPCVjRWqdOHQBOHnyb6DoRCkcjEHiH8ooK4ls8hFar4eDqXdSJjFI6JIECrP3pWx6d/iSzHr+f0Un9lQ5HIBAIHKagqIRmt46SdVt1BKxolVICoutEEB0tRKsgMHnu+fcBuLHb9TRp2EjhaARKUSfC9GFl5fc7eC75DoWjEQgEAudxJJVTFGIJBH6K0Wjk3aU/ADB/8hyFoxEoSd9rryc0JJSs3/7EaDQqHY5AIBB4BSFaBQI/ZeuOQ5zLvUDDhPo0adhY6XAEChIWGsblzS5VOgyBQCDwKkK0CgR+ypMTlwDw7P1PiA4ZAvl3YNeBwwpHIhAIBN5BiFaBwA8pKipl774jBAUH8chdw5UOR6ACxt/3GABTF3yscCQCgUDgHYRoFQj8kBdnfwpAn5u6ERwUrHA0AjXQv8eNgKlHtUAgEAQiQrQKBH5GRUUlby5eD8CckWkKRyNQG9l/HBfCVSAQBCRCtAoEfsauPYcpLCqlRZNLaN28ldLhCFRC3ZhYmjZsTG5BITt+/UPpcAQCgcDjCNEqEPgZT6QsBmD8sEcVjkSgJjQaDUNvHgxARaVe4WgEAoHA8wjRKhD4Eaf+PsfeX44QHBzEqLsfUjocgcoIDQkFYMGn6xSORCAQCDyPEK0CgR/x+LMml/Xu/ncqHIlAjYwb9ggA3+w8oHAkgtrEohXraTFoDItWrFc6FEGAI0SrQOBHrFq7C41GQ/oTzysdikCFxEXHUi+2rtJhCGoZczJXceL0GeZkrlI6FEGAI0SrQOAnzJ67EqPRSNfu7akXG6d0OAKVokFDUeEF/nf8tNKhCGoJE5OHcEmjeCYmD1E6FEGAI0SrQOAHGI1G5r9tylN8Y/xchaMRqJnkIcMAeOOTtQpHIqgtjE7qz9E1bzI6qb/SoQgCHCFaBQI/4OdfjnDqdC5xsZG0aXGZ0uEIVMyg603CwWg0KhyJQKBeRB6ufyJEq0DgBzyTmgnAuHvGoNPplA1G4Bd8v+tXpUMQCFSLyMP1T4RoFQhUTmlpOZu3/YZGo+HJ+0cpHY5A5XRo3Y6oiEh+P/43OWfPKx2OQKBKRB6ufyJEa4CyaMl6ml85ikVLxNaHv/PK/63GYDByTY8OhIeFKx2OQOUEBwXTo2M3APRinKtAYBORh+ufCNEaoMx+9XOOnzzD7Fc/VzoUgRvo9XpeW7gGgLmjZikcjcDfyPrtT6VDENQiRJ6owNsI0RqgTHr6TpolxjPpadGE3p859McpzuVeIKF+Qzq0bqd0OAI/4dGkEQA899ZXPllPiBUBiDxRgfcRojVAGf1wf44deIvRD4utD39mzFNvASYRotFoFI5G4C8M6NUXrUZLRUWFT9YTYkUAIk9U4H2EaBUIVMq5cxfYsv0QGg1MHPmk0uEI/JB/cv5Gr9d7fR0hVryLvzjZIk9U4G2EaBUIVMrSTzZhNBoZeJ34B0DgPJcmNqe0pJhVm3Z5fS0hVryLcLIFAhNCtAoEKmV6+icAPPmAaHMlcJ6nHxwHwIWiUoUjEbiLcLIFSqE2l1+IVoFAhezd9ycFF0pokphA947XKB2OwI858OcJpUMQuIlwsgVKoTaXX4hWgUCFTH7hQwCeGTbe4vXFK5fS9vZuLF65VImwBH7ENe06AzB/+bcKRyIQCPwVtbn8QrQKBCoj55/zfPvDL2g0Gu7uf4fFe3Pfn8/JnFPMfX++QtEJ/IXWLS6j1SUtqfRRBwEJtW0nCgQC11Gbyy9Eay1ETMtSN+8u/QGA2/vcQmydGIv3nnlwPIkJTXjmwfG2ThUILNBoNBgMBkrLyn22ptq2EwUCQeAgRKsf4GmRKaZlqZvXL07AmvzI01Xee+Su4Rz8cieP3DXc12EJ/JC2LS4HYMIr7/psTbVtJwoEgsBBiFY/wNMiU0zLUi9fr8/i37P5xDesR7vL2iodjsDPefmZFwE49U+uz9ZU23aiQCAIHIRoVRm2XFVPi0wxLUu9ZLy+GoCXRqUqHIkgEJBmqOUWFCoah0AgEHgCIVpVhi1X1ZciU+S7KkdhYQmbthxAp9NyW58BSocjCABi6sSg1WrZ8esfnDh9RulwBAKBwC18IlrffPNN2rdvT3R0NNHR0XTv3p1169ZVe87y5ctp06YNYWFhXHXVVXz99de+CFVxlN66F/muyjE9/VMA+ve8iZioaIWjEQQCURGRDBtwFwA55/KUDUYgEAjcxCeitWnTpsyePZs9e/aQlZXFjTfeyODBgzlw4IDN47dt28awYcN4+OGH+fnnnxkyZAhDhgxh//79vgjX55i7m0pv3SstmmsrRqORzI9MXQP+b9IchaMRBBKR4eEAVFTqFY5EIBAI3MMnovW2227j1ltv5bLLLuPyyy8nLS2NqKgoduzYYfP4119/nVtuuYWUlBTatm3LSy+9ROfOnZk/PzB7U6rJ3VRaNNdWNv20n9zzhTRIqE/DevFKh+PXiAEMllya2AKAh2Z9qHAkArUjeuwK1I7Pc1r1ej2ffPIJRUVFdO/e3eYx27dvp2/fvhav9e/fn+3bt9u9bllZGQUFBRZf/oJwNwUpz38AwJTkZ9BoNDUcLagOMYDBkjH3PExwUDBnz/6rdCi1HrWLQtFjV6B2fCZaf/31V6KioggNDWX06NF88cUXXHHFFTaPzcnJoWHDhhavNWzYkJycHLvXnzVrFjExMfJXYmKiR+N3FFcKmey5mzVdSxRNBQbFxWXs+flPgkOCGH7bPUqH4/eIAQyWaLVawkPDMBgMSodS61G7KDTvsat2gS2onfhMtLZu3Zp9+/axc+dOxowZw4MPPshvv/3msetPnjyZ/Px8+evkyZMeu7YzeHKrf8qMjzh+8gxTZnwEVBWpjq7lrrgV4ti7pL5o2ra9/qZuhASHKByN/yMGMFQlMjyCCwV5rNmcpXQotRq1D14w77GrdoHtCEJ4Bx4+E60hISG0atWKLl26MGvWLDp06MDrr79u89iEhAT++ecfi9f++ecfEhIS7F4/NDRU7k4gfSmBO1v9NYlDSaSOf/YdFi1ZL6/Vo1sbh85zVUirKec20KioqJTHts577GWFoxEEKjPGTwZg676DCkdSu1Fi8IKrwk3tAtsRAkF4CyxRrE+rwWCgrKzM5nvdu3fn+++/t3htw4YNdnNg1YSjhUy2BKokDqfM+IjmV46i/02daJYYT9q0+wGTINbptOj1Bma/+rm81radh2RRaW84Qd24KC4UlriUaiBybr3Hjt1/cKGwhJZNm9OiSTOlw1EtorjKPZomNAHAqHAcSlGbHTdXhVsgTDYLBOEtsMQnonXy5Mls3ryZY8eO8euvvzJ58mQ2bdrE/febxNiIESOYPHmyfPyTTz7JN998w9y5czl06BAvvPACWVlZjB8fODlqttxLSRwCHD95hm07D8kCeNGS9cx+9XPuvqOnhYBctGQ9FwpLqBsXxaSn77Q7nKBOVDi55wvtuqXVuamio4D3GP/MOwCMvWekwpGoG1Fc5R5S3983l39HZS1sfRWIjpujQrw2C7dAEN4CS3wiWv/9919
|
||
|
"text/plain": [
|
||
|
"<Figure size 800x400 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"def plot_data(X):\n",
|
||
|
" plt.plot(X[:, 0], X[:, 1], 'k.', markersize=2)\n",
|
||
|
"\n",
|
||
|
"def plot_decision_boundaries(clusterer, X, resolution=1000, show_centroids=True,\n",
|
||
|
" show_xlabels=True, show_ylabels=True):\n",
|
||
|
" mins = X.min(axis=0) - 0.1\n",
|
||
|
" maxs = X.max(axis=0) + 0.1\n",
|
||
|
" xx, yy = np.meshgrid(np.linspace(mins[0], maxs[0], resolution),\n",
|
||
|
" np.linspace(mins[1], maxs[1], resolution))\n",
|
||
|
" Z = clusterer.predict(np.c_[xx.ravel(), yy.ravel()])\n",
|
||
|
" Z = Z.reshape(xx.shape)\n",
|
||
|
"\n",
|
||
|
" plt.contourf(Z, extent=(mins[0], maxs[0], mins[1], maxs[1]),\n",
|
||
|
" cmap=\"Pastel2\")\n",
|
||
|
" plt.contour(Z, extent=(mins[0], maxs[0], mins[1], maxs[1]),\n",
|
||
|
" linewidths=1, colors='k')\n",
|
||
|
" plot_data(X)\n",
|
||
|
" if show_centroids:\n",
|
||
|
" plot_centroids(clusterer.cluster_centers_)\n",
|
||
|
"\n",
|
||
|
" if show_xlabels:\n",
|
||
|
" plt.xlabel(\"$x_1$\")\n",
|
||
|
" else:\n",
|
||
|
" plt.tick_params(labelbottom=False)\n",
|
||
|
" if show_ylabels:\n",
|
||
|
" plt.ylabel(\"$x_2$\", rotation=0)\n",
|
||
|
" else:\n",
|
||
|
" plt.tick_params(labelleft=False)\n",
|
||
|
"\n",
|
||
|
"plt.figure(figsize=(8, 4))\n",
|
||
|
"plot_decision_boundaries(kmeans, X)\n",
|
||
|
"plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "3c8361ca-5628-425d-a00b-1d5aee46f941",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Typically, each data instance is asigned to the cluster with the closest centroid (*hard clustering*)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "356ba5e6-5c23-44a9-9449-17997312fbf0",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Hard versus soft clustering"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "d65afee3-559f-4da1-8c9a-99460188b91a",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"*Hard clustering* corresponds to selecting the closest cluster for each instance.\n",
|
||
|
"\n",
|
||
|
"Alternatively, with *soft clustering* compute a similarly score based on the confidence of being associated with each cluster. This is given by the (Euclidean) distance between a data instance and the cluster centroid."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "df56c34f-97b0-4816-aa1f-1c87a694eceb",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Can be computed by the `transform()` method:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 15,
|
||
|
"id": "0925b7bf-5559-4566-a5e3-b82d6d166ce8",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:20.998235Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:20.998053Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:21.002930Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:21.002348Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[2.84, 0.59, 1.5 , 2.9 , 0.31],\n",
|
||
|
" [5.82, 2.97, 4.48, 5.85, 2.69],\n",
|
||
|
" [1.46, 3.11, 1.69, 0.29, 3.47],\n",
|
||
|
" [0.97, 3.09, 1.55, 0.36, 3.36]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 15,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"kmeans.transform(X_new).round(2)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "a98903e2-4bda-40a9-bf74-b316c08cf67b",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Can verify that this is indeed the Euclidian distance between each instance and each centroid:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 16,
|
||
|
"id": "ee7eef95-6760-4b40-9819-fe32350dd2a3",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:21.004670Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:21.004503Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:21.008883Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:21.008283Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[2.84, 0.59, 1.5 , 2.9 , 0.31],\n",
|
||
|
" [5.82, 2.97, 4.48, 5.85, 2.69],\n",
|
||
|
" [1.46, 3.11, 1.69, 0.29, 3.47],\n",
|
||
|
" [0.97, 3.09, 1.55, 0.36, 3.36]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 16,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"np.linalg.norm(np.tile(X_new, (1, k)).reshape(-1, k, 2)\n",
|
||
|
" - kmeans.cluster_centers_, axis=2).round(2)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "d041388d-b3f4-45eb-870d-0dcd48199e4c",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### K-means algorithm"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "aee9b2d6-cfda-4a47-bac9-2170d42dd111",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"If know centroids..."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "eeff7cbf-ff59-4e55-b5d1-231d88e2e6dc",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"can cluster data instances by finding closest centroid."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "7c6b3a87-8436-48a0-8bd6-920a3032c198",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"If know clusters... "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ba85317f-d84b-4601-95cb-bee4366a5816",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"can compute centroid of data instances in cluster."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "679f6ba5-1f5a-40b4-8843-4435a7fc507c",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"K-means finds centroids and clusters iteratively following this approach."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "d851e64a-7063-464b-8d4d-b54359f90cdf",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Algorithm\n",
|
||
|
"\n",
|
||
|
"Must set number of clusters to consider, $k$.\n",
|
||
|
"\n",
|
||
|
"Then run iterative algorithm to continually update estimates of centroids and cluster allocations.\n",
|
||
|
"\n",
|
||
|
"1. Randomly initialise centroids.\n",
|
||
|
"2. Label instances by closest centroid.\n",
|
||
|
"3. Calculate new centroid of each cluster by taking mean of instances in the cluster.\n",
|
||
|
"4. Repeat steps 2 to 3. "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "b2ab11e4-6d47-4bab-a4cb-de4c0f2f5358",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Algorithm visually\n",
|
||
|
"\n",
|
||
|
"Let's run K-Mean algorithm for 1, 2 and 3 iterations, to see how the centroids move."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 17,
|
||
|
"id": "f313bf6e-4fd2-425e-8592-a2050f07590e",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:21.010924Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:21.010760Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:22.385407Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:22.384850Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAAK+CAYAAACYWz5rAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzsnXd4FFX3x7+7m002m00nIZQQmgqI9BppghARVECMEhACWELIa0FKIEiTQAKiqCEEbEgJEiLi+8IPEQVFCdKLNJUWQCkhPdnU3fn9scwwOzuzO7M94X6eh4fszJ07d8ru3DPnnO+RURRFgUAgEAgEAoFAIBAIkLt6AAQCgUAgEAgEAoHgLhADiUAgEAgEAoFAIBDuQQwkAoFAIBAIBAKBQLgHMZAIBAKBQCAQCAQC4R7EQCIQCAQCgUAgEAiEexADiUAgEAgEAoFAIBDuQQwkAoFAIBAIBAKBQLgHMZAIBAKBQCAQCAQC4R7EQCIQCAQCgUAgEAiEexADyQUMGDAAAwYMcPUwRBEbGwuNRuPqYdRJFixYAJlMJqqtTCbDggUL7LbvZcuWoU2bNtDr9XbZR/PmzREbGyuqrS33N3eM69atg0wmw9WrV63qzx35+eefIZPJ8PPPP9epfffq1QszZ860/6AIhDrM1atXIZPJ8P7779utT3v8RshkMiQkJNhtTEJIeTYQCHWJB95Aoiexd+/e5V3fvn17tzFmtFotFixYYPeJlaP6rQtkZmZi5cqVrh6GXSkpKUFqaipmzZoFudwxX/Fz585hwYIF9cpwIZhn1qxZWLVqFW7duuXqoRAINkG/eDl69Kirh2ITOTk5WLBgAYqKilw9FIfwIM9NCK7ngTeQ6hJarRYLFy50iIHkiH7rAo40kObOnYuKigqH9G2OL774ArW1tRgzZozR8oqKCsydO9eqPv/88098+umnzOdz585h4cKFvAbSDz/8gB9++MGq/RDcl+eeew5+fn5IT0939VAIBAIMBtLChQtdaiBxnw325EGemxBcDzGQCASRVFZWGoWsWcLDwwMqlcqBI+Lnyy+/xLPPPmuyb5VKBQ8PD6v69PLyglKpFNXW09MTnp6eVu3HFWi1WlcPoU4gl8sxevRorF+/HhRFuXo4BALBDZDybCAQ6hLEQJIIHRu8ZcsWzJkzB2FhYfDx8cGzzz6L69evm7Rfu3YtWrVqBW9vb/To0QO//vqrSZvq6mrMmzcPXbt2hb+/P3x8fNC3b1/s27ePaXP16lWEhIQAABYuXAiZTGaSr3HhwgWMHj0aQUFBUKlU6NatG/773/+aPR4x/QLAP//8gxEjRkCj0SAkJATTp0+HTqczaqPX67Fy5Uo8+uijUKlUaNiwIV5//XUUFhaaHQN7/NHR0QgJCYG3tzceeeQRJCUlmYxj0qRJaNiwIby8vPDoo4/iiy++MGpDX6OsrCwkJyejadOmUKlUGDRoEC5evMi0GzBgAHbu3Inc3FzmuJs3b27Ux9dff425c+eiSZMmUKvVKCkpAQBs3boVXbt2hbe3Nxo0aIBx48bhn3/+MRoHXw5SVVUV3n77bYSEhMDX1xfPPvssbty4YXIuSktL8dZbb6F58+bw8vJCaGgoBg8ejOPHj5s9h1euXMHp06fx5JNPmqzjXld6fBcvXkRsbCwCAgLg7++PiRMnmhgN7DjzdevW4YUXXgAAPPHEE8y5o9/ycXOQxNzfYpkwYQIaNGiAmpoak3VDhgzBI488Ynb7AQMGoH379jh27Bj69esHtVqNOXPmAAC+++47DBs2DI0bN4aXlxdatWqF9957z+Q+p/s4d+4cnnjiCajVajRp0gTLli0z2d+NGzcwYsQI+Pj4IDQ0FG+//Taqqqp4xybmnqJzAq9du4bhw4dDo9GgSZMmWLVqFQDgjz/+wMCBA+Hj44OIiAhkZmaaPR/z58+HUqlEXl6eybrXXnsNAQEBqKysZJYNHjwYubm5OHnypNl+CYS6jjW/Wx9++CEiIiLg7e2N/v3748yZMyZtrHlO87FgwQLMmDEDANCiRQvmd5jr1d++fTvat2/PPC+///57k77EPFeF4OYg0eGLBw4cwLRp0xASEgIfHx+MHDnS5Hfm6NGjiIqKQoMGDeDt7Y0WLVpg0qRJACzPTU6fPo3Y2Fi0bNkSKpUKYWFhmDRpEvLz803Ok9jnHABs3LgRPXr0gFqtRmBgIPr162cSEbFr1y707dsXPj4+8PX1xbBhw3D27FmjNrdu3cLEiRPRtGlTeHl5oVGjRnjuuedIWHodwrrXyQQkJydDJpNh1qxZuHPnDlauXIknn3wSJ0+ehLe3NwDg888/x+uvv47IyEi89dZbuHz5Mp599lkEBQUhPDyc6aukpASfffYZxowZg1dffRWlpaX4/PPPERUVhcOHD6NTp04ICQnB6tWrMWXKFIwcORKjRo0CAHTo0AEAcPbsWTz++ONo0qQJEhMT4ePjg6ysLIwYMQLffPMNRo4cyXsclvoFAJ1Oh6ioKPTs2RPvv/8+fvzxR6xYsQKtWrXClClTmHavv/461q1bh4kTJ+KNN97AlStXkJaWhhMnTuDAgQNm3zKdPn0affv2hVKpxGuvvYbmzZvj0qVL+N///ofk5GQAwO3bt9GrVy8m+TQkJAS7du3C5MmTUVJSgrfeesuoz5SUFMjlckyfPh3FxcVYtmwZxo4di0OHDgEAkpKSUFxcjBs3buDDDz8EABNBivfeew+enp6YPn06qqqq4OnpyRxj9+7dsXTpUty+fRsfffQRDhw4gBMnTiAgIEDwOF955RVs3LgRMTExiIyMxN69ezFs2DCTdnFxccjOzkZCQgLatWuH/Px8/Pbbbzh//jy6dOki2H9OTg4AmG3DJTo6Gi1atMDSpUtx/PhxfPbZZwgNDUVqaipv+379+uGNN97Axx9/jDlz5qBt27YAwPzPRcz9LZaXX34Z69evx+7duzF8+HBm+a1bt7B3717Mnz/fYh/5+fkYOnQoXnrpJYwbNw4NGzYEYHiwazQaTJs2DRqNBnv37sW8efNQUlKC5cuXG/VRWFiIp556CqNGjUJ0dDSys7Mxa9YsPPbYYxg6dCgAQ0jjoEGDcO3aNbzxxhto3LgxNmzYgL1795qMSco9pdPpMHToUPTr1w/Lli3Dpk2bkJCQAB8fHyQlJWHs2LEYNWoUMjIyMH78ePTu3RstWrQQPJ+LFi3Cli1bjBK6q6urkZ2djeeff97IE9m1a1cAwIEDB9C5c2eL55pAqKtI/d1av349SktLMXXqVFRWVuKjjz7CwIED8ccffzC/MdY+p/kYNWoU/vrrL2zevBkffvghGjRoAACMUQEAv/32G7Zt24b4+Hj4+vri448/xvPPP49r164hODgYgPTnqlj+85//IDAwEPPnz8fVq1excuVKJCQkYMuWLQCAO3fuYMiQIQgJCUFiYiICAgJw9epVbNu2jTkOc3OTPXv24PLly5g4cSLCwsJw9uxZrF27FmfPnsXvv/9u8nJSzHNu4cKFWLBgASIjI7Fo0SJ4enri0KFD2Lt3L4YMGQIA2LBhAyZMmICoqCikpqZCq9Vi9erV6NOnD06cOMG8ZH3++edx9uxZ/Oc//0Hz5s1x584d7NmzB9euXWPaENwc6gFn/vz5FAAqLy+Pd/2jjz5K9e/fn/m8b98+CgDVpEkTqqSkhFmelZVFAaA++ugjiqIoqrq6mgoNDaU6depEVVVVMe3Wrl1LATDqs7a21qgNRVFUYWEh1bBhQ2rSpEnMsry8PAoANX/+fJNxDho0iHrssceoyspKZpler6ciIyOphx56yOw5MNfvhAkTKADUokWLjJZ37tyZ6tq1K/P5119/pQBQmzZtMmr3/fff8y7n0q9fP8rX15fKzc01Wq7X65m/J0+eTDVq1Ii6e/euUZuXXnqJ8vf3p7RaLUVR969R27Ztjc7rRx99RAGg/vjjD2bZsGHDqIiICJPx0H20bNmS6Zei7l/X9u3bUxUVFczyHTt2UACoefPmMcvoe4vm5MmTFAAqPj7eaF8xMTEm59/f35+aOnUq77kyx9y5cykAVGl
|
||
|
"text/plain": [
|
||
|
"<Figure size 1000x800 with 6 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"starting_random_state = 6\n",
|
||
|
"kmeans_iter1 = KMeans(n_clusters=5, init=\"random\", n_init=1, max_iter=1,\n",
|
||
|
" random_state=starting_random_state)\n",
|
||
|
"kmeans_iter2 = KMeans(n_clusters=5, init=\"random\", n_init=1, max_iter=2,\n",
|
||
|
" random_state=starting_random_state)\n",
|
||
|
"kmeans_iter3 = KMeans(n_clusters=5, init=\"random\", n_init=1, max_iter=3,\n",
|
||
|
" random_state=starting_random_state)\n",
|
||
|
"kmeans_iter1.fit(X)\n",
|
||
|
"kmeans_iter2.fit(X)\n",
|
||
|
"kmeans_iter3.fit(X)\n",
|
||
|
"\n",
|
||
|
"plt.figure(figsize=(10, 8))\n",
|
||
|
"\n",
|
||
|
"plt.subplot(321)\n",
|
||
|
"plot_data(X)\n",
|
||
|
"plot_centroids(kmeans_iter1.cluster_centers_, circle_color='r', cross_color='w')\n",
|
||
|
"plt.ylabel(\"$x_2$\", rotation=0)\n",
|
||
|
"plt.tick_params(labelbottom=False)\n",
|
||
|
"plt.title(\"Update the centroids (initially randomly)\")\n",
|
||
|
"\n",
|
||
|
"plt.subplot(322)\n",
|
||
|
"plot_decision_boundaries(kmeans_iter1, X, show_xlabels=False,\n",
|
||
|
" show_ylabels=False)\n",
|
||
|
"plt.title(\"Label the instances\")\n",
|
||
|
"\n",
|
||
|
"plt.subplot(323)\n",
|
||
|
"plot_decision_boundaries(kmeans_iter1, X, show_centroids=False,\n",
|
||
|
" show_xlabels=False)\n",
|
||
|
"plot_centroids(kmeans_iter2.cluster_centers_)\n",
|
||
|
"\n",
|
||
|
"plt.subplot(324)\n",
|
||
|
"plot_decision_boundaries(kmeans_iter2, X, show_xlabels=False,\n",
|
||
|
" show_ylabels=False)\n",
|
||
|
"\n",
|
||
|
"plt.subplot(325)\n",
|
||
|
"plot_decision_boundaries(kmeans_iter2, X, show_centroids=False)\n",
|
||
|
"plot_centroids(kmeans_iter3.cluster_centers_)\n",
|
||
|
"\n",
|
||
|
"plt.subplot(326)\n",
|
||
|
"plot_decision_boundaries(kmeans_iter3, X, show_ylabels=False)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "22b6d0c4-1524-4f57-917d-a2e5bb182684",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Convergence\n",
|
||
|
"\n",
|
||
|
"Convergence is guaranteed since mean squared distance between instances and their closest centroid can only go down at each step.\n",
|
||
|
"\n",
|
||
|
"Usually only a relatively small number of iterations are required."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f97d2836-fa6d-44b6-9381-a79ff0c71d1a",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Variability\n",
|
||
|
"\n",
|
||
|
"While the algorithm is guaranteed to converge, it might not converge to the desired solution.\n",
|
||
|
"\n",
|
||
|
"Final clusters are highly dependent on initial random centroid selection."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"id": "9f229355-f37d-4759-8880-93da14aade52",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:22.387747Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:22.387288Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.117782Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.117092Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAAFMCAYAAADx6ZzkAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA8tBJREFUeJzsnXl8DOcfxz+7m3M32RyEKHFWCZo46oj7iqNKHaFFEaW/ahJttXUkUXdCpKqtOEvFUapSbR0tomgpUUWDotRdBJFjJZt75/fHmjG7O7M7s3fieb9eXpLZeY7Z7M4zn+d7SSiKokAgEAgEAoFAIBAIBEgdPQECgUAgEAgEAoFAcBaIQCIQCAQCgUAgEAiEJxCBRCAQCAQCgUAgEAhPIAKJQCAQCAQCgUAgEJ5ABBKBQCAQCAQCgUAgPIEIJAKBQCAQCAQCgUB4AhFIBAKBQCAQCAQCgfAEIpAIBAKBQCAQCAQC4QlEIBEIBAKBQCAQCATCE4hAIhCsSPfu3dG9e3er9nnjxg1IJBKkpqZatV8CgfDsQu5Vhmg0GrRo0QIJCQkW9yXm/e3evTtatGhh8ZjWJDIyEvXr19c5JpFIMGfOHJ1jJ0+eRMeOHaFQKCCRSPDXX38BAPbu3YuWLVvCw8MDEokEeXl5dpn3swLX38KRWHI/4fqs8TFjxgy0b9/erHHEQgQS4Znm3LlziIiIQL169eDh4YHatWsjPDwcy5Yts/tctmzZgs8++8zu4xrjn3/+wZQpU9CxY0dmobtx44ajp0UgPHOQe5VxduzYgddeew0NGzaEXC5HkyZN8OGHH4p6MN+6dStu376NmJgYq8/v7t27mDNnDiMgqgJlZWUYPnw4cnJysHTpUmzatAn16tXDo0ePMGLECHh6emL58uXYtGkTFAqFo6fLSVX8u1R21Go15syZg8OHDxu89v777yMzMxM7d+60+TxcbD4CgeCkHDt2DD169EDdunXx1ltvITAwELdv30ZGRgY+//xzTJ482a7z2bJlC86fP4/3339f53i9evVQVFQEV1dXu84HAI4fP44vvvgCzZo1Q3BwMFlECAQHQO5Vpvnf//6H5557Dm+88Qbq1q2Lc+fOISUlBT/99BNOnz4NT09Pk30kJyfj9ddfh4+Pj8Xz2b9/v87vd+/exdy5c1G/fn20bNnS4v4dQVFREVxcnj42Xr16FTdv3sSXX36JiRMnMsf37t2Lx48fY/78+ejdu7cjpiqYqvB3cQb0P+9i+PLLL6HRaJjf1Wo15s6dCwAGVqnAwEC8+uqr+OSTTzBo0CCzxxQCEUiEZ5aEhAT4+Pjg5MmT8PX11XntwYMHjpkUBxKJBB4eHg4Ze9CgQcjLy4O3tzc++eQTIpAIBAdA7lWmSUtLM3iYatOmDcaNG4evv/5a5wGeizNnziAzMxNLliyxynzc3Nys0o8zof+3pT97fJ9J/eOWUFhY6LRWKGeem72w5PMudkNlxIgRGD58OK5du4aGDRuaPa4piIsd4Znl6tWraN68OedNvEaNGjq/l5eXY/78+WjUqBHc3d1Rv359xMXFoaSkxOgYqampnG5phw8fhkQiYUzI3bt3x549e3Dz5k1IJBJIJBLGJ5fPr//gwYPo0qULFAoFfH198eqrr+LixYs658yZMwcSiQT//vsvIiMj4evrCx8fH4wfPx5qtdrke+Tv7w9vb2+T5xEIBNtB7lWm71Vc8Q9DhgwBAIOxuPjhhx/g5uaGrl27MsfOnj0LiUSi485z6tQpSCQStG7dWqd9//79dWIj2DEZhw8fRtu2bQEA48ePZ943/ffpwoUL6NGjB+RyOWrXro3FixebnDcArF+/Hj179kSNGjXg7u6OZs2aYeXKlYLa0tfeokULeHh4oEWLFvj+++85z2PHvURGRqJbt24AgOHDh0MikTDXPG7cOABA27ZtIZFIEBkZyfRx4sQJ9OvXDz4+PpDL5ejWrRt+//13nXHoz8KFCxcwatQo+Pn5oXPnzszrmzdvRps2beDp6Ql/f3+8/vrruH37tk4fdFyXsfdU6N9F6NzOnj2LyMhINGzYEB4eHggMDMSbb76JR48ecfYh5LNeUlKCKVOmICAgAN7e3hg0aBD+++8/zrmdOXMG/fv3h1KphJeXF3r16oWMjAydc+jv+dGjR/Huu+8iICAAvr6+ePvtt1FaWoq8vDyMHTsWfn5+8PPzw7Rp00BRFO/7wX6/2d9B+r7x7bffIiEhAXXq1IGHhwd69eqFf//9V6ctOwbpxo0bCAgIAADMnTuX+Zuw461oq+SPP/5ocl6WQCxIhGeWevXq4fjx4zh//rzJANmJEydiw4YNiIiIwIcffogTJ05g4cKFuHjxIu9iIob4+Hjk5+fjv//+w9KlSwEAXl5evOcfOHAA/fv3R8OGDTFnzhwUFRVh2bJl6NSpE06fPm0Q8DhixAg0aNAACxcuxOnTp7F27VrUqFEDSUlJFs+dQCDYFnKvMu9elZWVBQCoXr26yXOPHTuGFi1a6Oxmt2jRAr6+vvjtt98Yd54jR45AKpUiMzMTKpUKSqUSGo0Gx44dw//+9z/OvoODgzFv3jzMmjUL//vf/9ClSxcAQMeOHZlzcnNz0a9fPwwdOhQjRoxAWloapk+fjhdffBH9+/c3OveVK1eiefPmGDRoEFxcXLBr1y5ERUVBo9EgOjraaNv9+/dj2LBhaNasGRYuXIhHjx5h/PjxqFOnjtF2b7/9NmrXro3ExES8++67aNu2LWrWrAkAaNKkCdasWYN58+ahQYMGaNSoEQCtUO7fvz/atGmD2bNnQyqVMuLuyJEjaNeunc4Yw4cPR+PGjZGYmMg8pCckJODjjz/GiBEjMHHiRDx8+BDLli1D165dcebMGZ1NBFPvqZC/Cx9cc0tPT8e1a9cwfvx4BAYG4u+//8aaNWvw999/IyMjAxKJRKcPIZ/1iRMnYvPmzRg1ahQ6duyIgwcPYsCAAQbz+fvvv9GlSxcolUpMmzYNrq6uWL16Nbp3745ff/3VILHB5MmTERgYiLlz5yIjIwNr1qyBr68vjh07hrp16yIxMRE//fQTkpOT0aJFC4wdO9bke8LFokWLIJVK8dFHHyE/Px+LFy/G6NGjceLECc7zAwICsHLlSrzzzjsYMmQIhg4dCgAICQlhzvHx8UGjRo3w+++/Y8qUKWbNSxAUgfCMsn//fkomk1EymYwKCwujpk2bRu3bt48qLS3VOe+vv/6iAFATJ07UOf7RRx9RAKiDBw8yx7p160Z169aN+X39+vUUAOr69es6bQ8dOkQBoA4dOsQcGzBgAFWvXj2DeV6/fp0CQK1fv5451rJlS6pGjRrUo0ePmGOZmZmUVCqlxo4dyxybPXs2BYB68803dfocMmQIVa1aNb63hpPk5GTOayEQCLaF3KvE3atoJkyYQMlkMury5csmz61Tpw41bNgwg+MDBgyg2rVrx/w+dOhQaujQoZRMJqN+/vlniqIo6vTp0xQA6scff2TO039/T548afDesM8FQG3cuJE5VlJSQgUGBnLOSR+1Wm1wrG/fvlTDhg1Ntm3ZsiVVq1YtKi8vjzm2f/9+CoDB3xgANXv2bOZ3+rOxfft2nfPoz9LJkyeZYxqNhmrcuDHVt29fSqPR6My9QYMGVHh4OHOM/iyMHDlSp98bN25QMpmMSkhI0Dl+7tw5ysXFRee40PfU2N+FC7650deiz9atWykA1G+//WbQh6nPOv19joqK0jlv1KhRBn+LwYMHU25ubtTVq1eZY3fv3qW8vb2prl27Msfov43+3yEsLIySSCTUpEmTmGPl5eVUnTp1dD7HfOh/3unPRnBwMFVSUsIc//zzzykA1Llz55hj48aN0/msPXz40OD69OnTpw8VHBxscl6WQFzsCM8s4eHhOH78OAYNGoTMzEwsXrwYffv2Re3atXVcKn766ScAwAcffKDT/sMPPwQA7Nmzx36TBnDv3j389ddfiIyMhL+/P3M8JCQE4eHhzHzZTJo0Sef3Ll264NGjR1CpVDafL4FAsAxyrxJ/r9qyZQvWrVuHDz/8EI0bNzZ5/qNHj+D
|
||
|
"text/plain": [
|
||
|
"<Figure size 1000x320 with 2 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"def plot_clusterer_comparison(clusterer1, clusterer2, X, title1=None,\n",
|
||
|
" title2=None):\n",
|
||
|
" clusterer1.fit(X)\n",
|
||
|
" clusterer2.fit(X)\n",
|
||
|
"\n",
|
||
|
" plt.figure(figsize=(10, 3.2))\n",
|
||
|
"\n",
|
||
|
" plt.subplot(121)\n",
|
||
|
" plot_decision_boundaries(clusterer1, X)\n",
|
||
|
" if title1:\n",
|
||
|
" plt.title(title1)\n",
|
||
|
"\n",
|
||
|
" plt.subplot(122)\n",
|
||
|
" plot_decision_boundaries(clusterer2, X, show_ylabels=False)\n",
|
||
|
" if title2:\n",
|
||
|
" plt.title(title2)\n",
|
||
|
"\n",
|
||
|
"kmeans_rnd_init1 = KMeans(n_clusters=5, init=\"random\", n_init=1, random_state=2)\n",
|
||
|
"kmeans_rnd_init2 = KMeans(n_clusters=5, init=\"random\", n_init=1, random_state=9)\n",
|
||
|
"\n",
|
||
|
"plot_clusterer_comparison(kmeans_rnd_init1, kmeans_rnd_init2, X,\n",
|
||
|
" \"Solution 1\",\n",
|
||
|
" \"Solution 2 (with a different random init)\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "161b0c65-e0ae-474a-aedb-0f3a4d4c2c96",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Manual initialisation\n",
|
||
|
"\n",
|
||
|
"If have a rough idea of good clustering, can set manual initialisation."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 19,
|
||
|
"id": "111f75d2-01d6-4216-9b4c-95f433895eeb",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.119729Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.119556Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.128790Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.128193Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<style>#sk-container-id-1 {\n",
|
||
|
" /* Definition of color scheme common for light and dark mode */\n",
|
||
|
" --sklearn-color-text: #000;\n",
|
||
|
" --sklearn-color-text-muted: #666;\n",
|
||
|
" --sklearn-color-line: gray;\n",
|
||
|
" /* Definition of color scheme for unfitted estimators */\n",
|
||
|
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
|
||
|
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
|
||
|
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
|
||
|
" --sklearn-color-unfitted-level-3: chocolate;\n",
|
||
|
" /* Definition of color scheme for fitted estimators */\n",
|
||
|
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
|
||
|
" --sklearn-color-fitted-level-1: #d4ebff;\n",
|
||
|
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
|
||
|
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
|
||
|
"\n",
|
||
|
" /* Specific color for light theme */\n",
|
||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
|
||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
||
|
" --sklearn-color-icon: #696969;\n",
|
||
|
"\n",
|
||
|
" @media (prefers-color-scheme: dark) {\n",
|
||
|
" /* Redefinition of color scheme for dark theme */\n",
|
||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
|
||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
||
|
" --sklearn-color-icon: #878787;\n",
|
||
|
" }\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 pre {\n",
|
||
|
" padding: 0;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 input.sk-hidden--visually {\n",
|
||
|
" border: 0;\n",
|
||
|
" clip: rect(1px 1px 1px 1px);\n",
|
||
|
" clip: rect(1px, 1px, 1px, 1px);\n",
|
||
|
" height: 1px;\n",
|
||
|
" margin: -1px;\n",
|
||
|
" overflow: hidden;\n",
|
||
|
" padding: 0;\n",
|
||
|
" position: absolute;\n",
|
||
|
" width: 1px;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-dashed-wrapped {\n",
|
||
|
" border: 1px dashed var(--sklearn-color-line);\n",
|
||
|
" margin: 0 0.4em 0.5em 0.4em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" padding-bottom: 0.4em;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-container {\n",
|
||
|
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
|
||
|
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
|
||
|
" so we also need the `!important` here to be able to override the\n",
|
||
|
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
|
||
|
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
|
||
|
" display: inline-block !important;\n",
|
||
|
" position: relative;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-text-repr-fallback {\n",
|
||
|
" display: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"div.sk-parallel-item,\n",
|
||
|
"div.sk-serial,\n",
|
||
|
"div.sk-item {\n",
|
||
|
" /* draw centered vertical line to link estimators */\n",
|
||
|
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
|
||
|
" background-size: 2px 100%;\n",
|
||
|
" background-repeat: no-repeat;\n",
|
||
|
" background-position: center center;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Parallel-specific style estimator block */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-parallel-item::after {\n",
|
||
|
" content: \"\";\n",
|
||
|
" width: 100%;\n",
|
||
|
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
|
||
|
" flex-grow: 1;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-parallel {\n",
|
||
|
" display: flex;\n",
|
||
|
" align-items: stretch;\n",
|
||
|
" justify-content: center;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" position: relative;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-parallel-item {\n",
|
||
|
" display: flex;\n",
|
||
|
" flex-direction: column;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-parallel-item:first-child::after {\n",
|
||
|
" align-self: flex-end;\n",
|
||
|
" width: 50%;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-parallel-item:last-child::after {\n",
|
||
|
" align-self: flex-start;\n",
|
||
|
" width: 50%;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-parallel-item:only-child::after {\n",
|
||
|
" width: 0;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Serial-specific style estimator block */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-serial {\n",
|
||
|
" display: flex;\n",
|
||
|
" flex-direction: column;\n",
|
||
|
" align-items: center;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" padding-right: 1em;\n",
|
||
|
" padding-left: 1em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
|
||
|
"clickable and can be expanded/collapsed.\n",
|
||
|
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
|
||
|
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
|
||
|
"*/\n",
|
||
|
"\n",
|
||
|
"/* Pipeline and ColumnTransformer style (default) */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-toggleable {\n",
|
||
|
" /* Default theme specific background. It is overwritten whether we have a\n",
|
||
|
" specific estimator or a Pipeline/ColumnTransformer */\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Toggleable label */\n",
|
||
|
"#sk-container-id-1 label.sk-toggleable__label {\n",
|
||
|
" cursor: pointer;\n",
|
||
|
" display: flex;\n",
|
||
|
" width: 100%;\n",
|
||
|
" margin-bottom: 0;\n",
|
||
|
" padding: 0.5em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" text-align: center;\n",
|
||
|
" align-items: start;\n",
|
||
|
" justify-content: space-between;\n",
|
||
|
" gap: 0.5em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 label.sk-toggleable__label .caption {\n",
|
||
|
" font-size: 0.6rem;\n",
|
||
|
" font-weight: lighter;\n",
|
||
|
" color: var(--sklearn-color-text-muted);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n",
|
||
|
" /* Arrow on the left of the label */\n",
|
||
|
" content: \"▸\";\n",
|
||
|
" float: left;\n",
|
||
|
" margin-right: 0.25em;\n",
|
||
|
" color: var(--sklearn-color-icon);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Toggleable content - dropdown */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-toggleable__content {\n",
|
||
|
" max-height: 0;\n",
|
||
|
" max-width: 0;\n",
|
||
|
" overflow: hidden;\n",
|
||
|
" text-align: left;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-toggleable__content.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-toggleable__content pre {\n",
|
||
|
" margin: 0.2em;\n",
|
||
|
" border-radius: 0.25em;\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
|
||
|
" /* Expand drop-down */\n",
|
||
|
" max-height: 200px;\n",
|
||
|
" max-width: 100%;\n",
|
||
|
" overflow: auto;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
|
||
|
" content: \"▾\";\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Pipeline/ColumnTransformer-specific style */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator-specific style */\n",
|
||
|
"\n",
|
||
|
"/* Colorize estimator box */\n",
|
||
|
"#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n",
|
||
|
"#sk-container-id-1 div.sk-label label {\n",
|
||
|
" /* The background is the default theme color */\n",
|
||
|
" color: var(--sklearn-color-text-on-default-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover, darken the color of the background */\n",
|
||
|
"#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Label box, darken color on hover, fitted */\n",
|
||
|
"#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator label */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-label label {\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" font-weight: bold;\n",
|
||
|
" display: inline-block;\n",
|
||
|
" line-height: 1.2em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-label-container {\n",
|
||
|
" text-align: center;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator-specific */\n",
|
||
|
"#sk-container-id-1 div.sk-estimator {\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" border: 1px dotted var(--sklearn-color-border-box);\n",
|
||
|
" border-radius: 0.25em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" margin-bottom: 0.5em;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-estimator.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* on hover */\n",
|
||
|
"#sk-container-id-1 div.sk-estimator:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 div.sk-estimator.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
|
||
|
"\n",
|
||
|
"/* Common style for \"i\" and \"?\" */\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link,\n",
|
||
|
"a:link.sk-estimator-doc-link,\n",
|
||
|
"a:visited.sk-estimator-doc-link {\n",
|
||
|
" float: right;\n",
|
||
|
" font-size: smaller;\n",
|
||
|
" line-height: 1em;\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" border-radius: 1em;\n",
|
||
|
" height: 1em;\n",
|
||
|
" width: 1em;\n",
|
||
|
" text-decoration: none !important;\n",
|
||
|
" margin-left: 0.5em;\n",
|
||
|
" text-align: center;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link.fitted,\n",
|
||
|
"a:link.sk-estimator-doc-link.fitted,\n",
|
||
|
"a:visited.sk-estimator-doc-link.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover */\n",
|
||
|
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
|
||
|
".sk-estimator-doc-link:hover,\n",
|
||
|
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
|
||
|
".sk-estimator-doc-link:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
|
||
|
".sk-estimator-doc-link.fitted:hover,\n",
|
||
|
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
|
||
|
".sk-estimator-doc-link.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Span, style for the box shown on hovering the info icon */\n",
|
||
|
".sk-estimator-doc-link span {\n",
|
||
|
" display: none;\n",
|
||
|
" z-index: 9999;\n",
|
||
|
" position: relative;\n",
|
||
|
" font-weight: normal;\n",
|
||
|
" right: .2ex;\n",
|
||
|
" padding: .5ex;\n",
|
||
|
" margin: .5ex;\n",
|
||
|
" width: min-content;\n",
|
||
|
" min-width: 20ex;\n",
|
||
|
" max-width: 50ex;\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" box-shadow: 2pt 2pt 4pt #999;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link.fitted span {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background: var(--sklearn-color-fitted-level-0);\n",
|
||
|
" border: var(--sklearn-color-fitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link:hover span {\n",
|
||
|
" display: block;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 a.estimator_doc_link {\n",
|
||
|
" float: right;\n",
|
||
|
" font-size: 1rem;\n",
|
||
|
" line-height: 1em;\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" border-radius: 1rem;\n",
|
||
|
" height: 1rem;\n",
|
||
|
" width: 1rem;\n",
|
||
|
" text-decoration: none;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 a.estimator_doc_link.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover */\n",
|
||
|
"#sk-container-id-1 a.estimator_doc_link:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KMeans(init=array([[-3, 3],\n",
|
||
|
" [-3, 2],\n",
|
||
|
" [-3, 1],\n",
|
||
|
" [-1, 2],\n",
|
||
|
" [ 0, 2]]),\n",
|
||
|
" n_clusters=5, n_init=1, random_state=42)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow\"><div><div>KMeans</div></div><div><a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.6/modules/generated/sklearn.cluster.KMeans.html\">?<span>Documentation for KMeans</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></div></label><div class=\"sk-toggleable__content fitted\"><pre>KMeans(init=array([[-3, 3],\n",
|
||
|
" [-3, 2],\n",
|
||
|
" [-3, 1],\n",
|
||
|
" [-1, 2],\n",
|
||
|
" [ 0, 2]]),\n",
|
||
|
" n_clusters=5, n_init=1, random_state=42)</pre></div> </div></div></div></div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
"KMeans(init=array([[-3, 3],\n",
|
||
|
" [-3, 2],\n",
|
||
|
" [-3, 1],\n",
|
||
|
" [-1, 2],\n",
|
||
|
" [ 0, 2]]),\n",
|
||
|
" n_clusters=5, n_init=1, random_state=42)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 19,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"good_init = np.array([[-3, 3], [-3, 2], [-3, 1], [-1, 2], [0, 2]])\n",
|
||
|
"kmeans = KMeans(n_clusters=5, init=good_init, n_init=1, random_state=42)\n",
|
||
|
"kmeans.fit(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 20,
|
||
|
"id": "1102abdb-5e5e-4200-b147-4b989002ef71",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.130470Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.130290Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.415955Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.415286Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAq0AAAFzCAYAAAAUgDBpAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAkCZJREFUeJztnXl4E9X3xt9Jui/pAoUilEX5Kii2giK0IoJQq4KKWFBUsLj8WFoVkK0tgihtKaC4tIALWkBApSAqilAERGSRRUAQ3NgqUCh0CeneJr8/4gxJmmWSTGYmyfk8Tx/SzMy9ZybVvn3vOecyOp1OB4IgCIIgCIKQMQqpAyAIgiAIgiAIW5BoJQiCIAiCIGQPiVaCIAiCIAhC9pBoJQiCIAiCIGQPiVaCIAiCIAhC9pBoJQiCIAiCIGQPiVaCIAiCIAhC9vhIHYCr0Gq1OH/+PEJDQ8EwjNThEARBEARBECbodDpcvXoV1113HRQK616qx4rW8+fPIyYmRuowCIIgCIIgCBsUFxejXbt2Vs/xWNEaGhoKAPhk73cICgmWOBqCcA2P33IPAKBj+5Y4vOttaYMhvJY//jqHO/tPg5+fHzZv3ix1OARBuBFVVVUYNGgQp9us4bGilU0JCAoJRlBoiMTREITwXCo+z73eu20eVKogCaMhvJkjx84AAIYNG4aQEPr/LUEQ9sMnlZMKsQjCTZk27AUAgFKpQKuocGmDIQgA//vf/6QOgSAID4ZEK0G4IVXqq7h8vgQA8Fr6cImjIQiCIAjXQ6KVINyQbwo+516nv/KYhJEQBEEQhDiQaCUIN2T1gsUAgBs6tYZSqZQ4GoIgCIJwPSRaCcLN+HXnHmj/e/3LtnmSxkIQBEEQYkGilSDcjPzp2QCAAH9fREbabhFCEARBEJ4AiVaCcCPqa+tw8ew5AMC4F+6XOBqCIAiCEA8SrQThRnyS8w73+q3s0RJGQhAEQRDiQqKVINyIb5etAQDceXtniSMhCIIgCHEh0UoQbsIvW3ZAp9WXYH343niJoyEIgtBTWFiIwYMHo7CwUOpQCA+HRCtBuAmLZ8wFAAQG+CG2W0dpgyEIgviPgoIClJSUoKCgQOpQCA+HRCtBuAmXz18EAMyfM0riSAiCIK6RkpKC6OhopKSkSB0K4eH4SB0AQRC2eeP5Sdzr1P97UMJICIIgjElOTkZycrLUYRBeADmtBOEG7CvaAQDom3CzxJEQBEG4P5SH656QaCUImXPy9z+h0+kAAIvfHiNxNARBEO4P5eG6JyRaCULmZAz/PwCAv58Pbu4SI3E0BEEQ7g/l4bonJFo9lI0rCvFs/GBsXEFLH+5MXU0tqtRXAQCfL3tF4mgIgiA8g+TkZGzYsIFycd0MEq0eyppFBSg9dwFrFhVIHQrhBHnTs7jXDyT2kDASgiAI61CeKOFqSLR6KMPGpyCqbRsMG58idSiEg+h0Omz/8jsA+h2w/Px8JY6IIMyz/aejAIB9+/ZJHAkhJZQnSrgaannloTwwMhkPjKRlD3fm7J//cK83rn1VwkgIwjobiw4CAHbu3ClxJISUpKSkoKCggPJECZdBTitByJSpjz0HAPD380VkZKjE0RCEZdjUlT59+kgciWfiLsvulCdKuBoSrQQhQ8ouXUa1WgMA+DBvnMTREIR1+t3dDQDQs2dPiSPxTGjZnSD0kGglCBny6fxF3OunH79HwkgIgpAaas9ESIXcXH4SrQQhQ4o+/woA0COuExiGkTgagiCkhJbdCamQm8tPopUgZMbP323lXu/dNs/o2JKlm9DxljFYsnST2GERBEEQXobcXH4SrQQhMxZlZgMAVKGB8PFRGh2b+9Y6nCkuxdy31kkRGkHYRG7LiQRBOI7cXH4SrV4I7ZYlX9RXyqG+Ug4AeHXqsGbHp08aig4xUZg+aajYoREEL+S2nEgQhOdAotUNEFpk0m5Z8mXJjFwAAANg8stDmh0f+1wSTh97H2OfSxI3MILgidyWEwmC8BxItLoBQotM2i1Lvvz0bREA4MEk2rKVcE/ktpxIEITnQKJVZphzVYUWmQ+MTMbHuzfQjlky4/uV1/JUl7xNvVkJgiAIwhASrTLDnKsqpsikfFfpKJj7HgAgPCwY7dq2kDgaguDPmeJSqUMgCMILEEW0Ll68GLGxsVCpVFCpVIiPj8fGjRutXrNmzRp06dIFAQEBuPXWW/Hdd9+JEarkSL10T/mu0lFVqQYAvLfgOYkjIQj7WPDO1wCAzp07SxwJQRCejCiitV27dpg7dy4OHDiA/fv3495778UjjzyCY8eOmT1/165dGDFiBJ577jn8+uuvGDJkCIYMGYKjR4+KEa7oGLqbUi/dSy2avZWpjz3LvX768X7SBUIQDlBXX4927dqhS5cuUodCEIQHI4pofeihh/Dggw/if//7H2688UZkZWUhJCQEe/bsMXv+O++8g/vvvx9TpkxB165d8cYbb6BHjx7Iy8sTI1zRkZO7KbVo9kZ0Oh2O7zsMAHh86F0SR+P+0AYM0uDj4yN1CISTUI9dQu6IntPa1NSEzz77DFVVVYiPjzd7zu7duzFw4ECj95KSkrB7926L49bV1UGtVht9uQvkbno3+7bs5F7Pe2OkhJF4BrQBAyFX5C4KqccuIXdEE62//fYbQkJC4O/vj7Fjx+LLL7/EzTffbPbckpIStG7d2ui91q1bo6SkxOL4OTk5CAsL475iYmIEjZ8vjhQyWXI3bY1FRVOewYJxUwEAoSGBaB/TSuJo3B/agIGQK3IXhYY9duUusAnvRDTRetNNN+HQoUPYu3cvxo0bh2eeeQa///67YOOnp6ejsrKS+youLhZsbHsQcql/+fx8lJ67gOXz8wE0F6l853JW3JI4dh0atRo19fUAgC1fz5I4Gs+ANmAg5IrcN14w7LErd4HNBxLenodootXPzw+dO3fG7bffjpycHMTFxeGdd94xe250dDQuXrxo9N7FixcRHR1tcXx/f3+uOwH7JQXOLPXbEoesSF0yMxcbVxRyc3W9PZbXdY4KaTnl3HoaWc+/wr2+owdVXhOEJyPFxguOCje5C2w+eILwJoyRrE+rVqtFXV2d2WPx8fH44YcfjN4rKiqymAMrJ/gWMpkTqKw4XD4/H8/GD0aPvvGIatsGo6akAtALYoVSAW2TFmsWFXBzHT9whBOVljYnCAlXoaaqyqFUA8q5dQ1NjY04uucgAOChB3pCoaC2yZag4irCGbzZcXNUuHnCzmaeILwJY0T5LZmeno4dO3bg9OnT+O2335Ceno7t27fjqaeeAgCMGjUK6enp3Pkvv/wyvv/+e7z55ps4ceIEXnvtNezfvx9paWlihCsK5txLVhwCQOm5Czh+4AgngDeuKMSaRQXoMyjRSEBuXFGImqoqhISrMGx8isXNCQKDg6GpUFt0S625qdRRwDX8vu8Q93rV0pelC8QNoOIqwhk80XHjK8S9Wbh5gvAmjBFFtF66dAmjRo3CTTfdhAEDBmDfvn3YtGkTEhMTAQBnz57FhQsXuPMTEhKwatUqfPDBB4iLi0NhYSHWr1+Pbt26iRGuKJhzL1lxOGpKarNjrKg0FLKAPu9VU6Hmrrfkina9PRYKpQJdb4/lHQ/hWl4b9SIAICQkACEhQS6bxxNcSiquIpzBE4UbXyFOwo3wJBidTqeTOghXoFarERYWhs+P/Yig0BCpw3Ea1mkdNj7FyPEcEdsfmgo1/AMDoIqM4ETnmkUF6Hp7LI4fOGLkwIaEqxAYHNxsHEJcLpwuxv/1HQIA2LkpC3fFd212zpKlmzD3rXWYPmmoU0VFHW8ZgzPFpegQE4XTx953eByCsIQy/DG0b9/BK5ffpaKwsBAFBQVISUkhQUq4NRqNBv369UNlZaXNeiRKonNzWFfW19+PW95nBerOb4u4wq2ut8ciJFyFqsqrVFQlAz6YuQAAwDAMEnqb30VIqCVxcikJwvMgB5XwRki0SoS9PVjN5Zwauq+GKQVsKsD1t3ThCreOHzgCQL/7EsMwlAYgMfu36zcUGHDPrWAYxuw5QolNagFFEPLHm4vFCIIvJFolwlYbKdPj5nJ
|
||
|
"text/plain": [
|
||
|
"<Figure size 800x400 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(8, 4))\n",
|
||
|
"plot_decision_boundaries(kmeans, X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "8b289ad4-aacb-4ee7-a5ee-b6f77a5a60ae",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Performance measure\n",
|
||
|
"\n",
|
||
|
"In order to assess whether a set of clusters is good, need to have some measure of performance.\n",
|
||
|
"\n",
|
||
|
"Goodness can be estimated by *sum of squared distances between instances and their clustered centroid* (low good).\n",
|
||
|
"\n",
|
||
|
"Score given by negative of this qualtity (so high good)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 21,
|
||
|
"id": "b18c5ff7-0a7c-44f9-be65-ccbed864eca7",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.417941Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.417765Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.422662Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.422070Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"-211.59853725816828"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 21,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"kmeans.score(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "fbdd8b7f-86bd-4a66-bf74-7986609f996c",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Multiple initialisations\n",
|
||
|
"\n",
|
||
|
"Now have measure of performance, can run multiple times with different centroid initialisations and then select clustering with highest score.\n",
|
||
|
"\n",
|
||
|
"Controlled by `n_init` variable in Scikit-Learn."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 22,
|
||
|
"id": "59c62d7b-1a33-43a9-a348-72190b6894fc",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.424453Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.424262Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.437608Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.437018Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<style>#sk-container-id-2 {\n",
|
||
|
" /* Definition of color scheme common for light and dark mode */\n",
|
||
|
" --sklearn-color-text: #000;\n",
|
||
|
" --sklearn-color-text-muted: #666;\n",
|
||
|
" --sklearn-color-line: gray;\n",
|
||
|
" /* Definition of color scheme for unfitted estimators */\n",
|
||
|
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
|
||
|
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
|
||
|
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
|
||
|
" --sklearn-color-unfitted-level-3: chocolate;\n",
|
||
|
" /* Definition of color scheme for fitted estimators */\n",
|
||
|
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
|
||
|
" --sklearn-color-fitted-level-1: #d4ebff;\n",
|
||
|
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
|
||
|
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
|
||
|
"\n",
|
||
|
" /* Specific color for light theme */\n",
|
||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
|
||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
||
|
" --sklearn-color-icon: #696969;\n",
|
||
|
"\n",
|
||
|
" @media (prefers-color-scheme: dark) {\n",
|
||
|
" /* Redefinition of color scheme for dark theme */\n",
|
||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
|
||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
||
|
" --sklearn-color-icon: #878787;\n",
|
||
|
" }\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 pre {\n",
|
||
|
" padding: 0;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 input.sk-hidden--visually {\n",
|
||
|
" border: 0;\n",
|
||
|
" clip: rect(1px 1px 1px 1px);\n",
|
||
|
" clip: rect(1px, 1px, 1px, 1px);\n",
|
||
|
" height: 1px;\n",
|
||
|
" margin: -1px;\n",
|
||
|
" overflow: hidden;\n",
|
||
|
" padding: 0;\n",
|
||
|
" position: absolute;\n",
|
||
|
" width: 1px;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-dashed-wrapped {\n",
|
||
|
" border: 1px dashed var(--sklearn-color-line);\n",
|
||
|
" margin: 0 0.4em 0.5em 0.4em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" padding-bottom: 0.4em;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-container {\n",
|
||
|
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
|
||
|
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
|
||
|
" so we also need the `!important` here to be able to override the\n",
|
||
|
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
|
||
|
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
|
||
|
" display: inline-block !important;\n",
|
||
|
" position: relative;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-text-repr-fallback {\n",
|
||
|
" display: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"div.sk-parallel-item,\n",
|
||
|
"div.sk-serial,\n",
|
||
|
"div.sk-item {\n",
|
||
|
" /* draw centered vertical line to link estimators */\n",
|
||
|
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
|
||
|
" background-size: 2px 100%;\n",
|
||
|
" background-repeat: no-repeat;\n",
|
||
|
" background-position: center center;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Parallel-specific style estimator block */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-parallel-item::after {\n",
|
||
|
" content: \"\";\n",
|
||
|
" width: 100%;\n",
|
||
|
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
|
||
|
" flex-grow: 1;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-parallel {\n",
|
||
|
" display: flex;\n",
|
||
|
" align-items: stretch;\n",
|
||
|
" justify-content: center;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" position: relative;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-parallel-item {\n",
|
||
|
" display: flex;\n",
|
||
|
" flex-direction: column;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-parallel-item:first-child::after {\n",
|
||
|
" align-self: flex-end;\n",
|
||
|
" width: 50%;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-parallel-item:last-child::after {\n",
|
||
|
" align-self: flex-start;\n",
|
||
|
" width: 50%;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-parallel-item:only-child::after {\n",
|
||
|
" width: 0;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Serial-specific style estimator block */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-serial {\n",
|
||
|
" display: flex;\n",
|
||
|
" flex-direction: column;\n",
|
||
|
" align-items: center;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" padding-right: 1em;\n",
|
||
|
" padding-left: 1em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
|
||
|
"clickable and can be expanded/collapsed.\n",
|
||
|
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
|
||
|
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
|
||
|
"*/\n",
|
||
|
"\n",
|
||
|
"/* Pipeline and ColumnTransformer style (default) */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-toggleable {\n",
|
||
|
" /* Default theme specific background. It is overwritten whether we have a\n",
|
||
|
" specific estimator or a Pipeline/ColumnTransformer */\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Toggleable label */\n",
|
||
|
"#sk-container-id-2 label.sk-toggleable__label {\n",
|
||
|
" cursor: pointer;\n",
|
||
|
" display: flex;\n",
|
||
|
" width: 100%;\n",
|
||
|
" margin-bottom: 0;\n",
|
||
|
" padding: 0.5em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" text-align: center;\n",
|
||
|
" align-items: start;\n",
|
||
|
" justify-content: space-between;\n",
|
||
|
" gap: 0.5em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 label.sk-toggleable__label .caption {\n",
|
||
|
" font-size: 0.6rem;\n",
|
||
|
" font-weight: lighter;\n",
|
||
|
" color: var(--sklearn-color-text-muted);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 label.sk-toggleable__label-arrow:before {\n",
|
||
|
" /* Arrow on the left of the label */\n",
|
||
|
" content: \"▸\";\n",
|
||
|
" float: left;\n",
|
||
|
" margin-right: 0.25em;\n",
|
||
|
" color: var(--sklearn-color-icon);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Toggleable content - dropdown */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-toggleable__content {\n",
|
||
|
" max-height: 0;\n",
|
||
|
" max-width: 0;\n",
|
||
|
" overflow: hidden;\n",
|
||
|
" text-align: left;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-toggleable__content.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-toggleable__content pre {\n",
|
||
|
" margin: 0.2em;\n",
|
||
|
" border-radius: 0.25em;\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-toggleable__content.fitted pre {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
|
||
|
" /* Expand drop-down */\n",
|
||
|
" max-height: 200px;\n",
|
||
|
" max-width: 100%;\n",
|
||
|
" overflow: auto;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
|
||
|
" content: \"▾\";\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Pipeline/ColumnTransformer-specific style */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator-specific style */\n",
|
||
|
"\n",
|
||
|
"/* Colorize estimator box */\n",
|
||
|
"#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-label label.sk-toggleable__label,\n",
|
||
|
"#sk-container-id-2 div.sk-label label {\n",
|
||
|
" /* The background is the default theme color */\n",
|
||
|
" color: var(--sklearn-color-text-on-default-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover, darken the color of the background */\n",
|
||
|
"#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Label box, darken color on hover, fitted */\n",
|
||
|
"#sk-container-id-2 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator label */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-label label {\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" font-weight: bold;\n",
|
||
|
" display: inline-block;\n",
|
||
|
" line-height: 1.2em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-label-container {\n",
|
||
|
" text-align: center;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator-specific */\n",
|
||
|
"#sk-container-id-2 div.sk-estimator {\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" border: 1px dotted var(--sklearn-color-border-box);\n",
|
||
|
" border-radius: 0.25em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" margin-bottom: 0.5em;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-estimator.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* on hover */\n",
|
||
|
"#sk-container-id-2 div.sk-estimator:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 div.sk-estimator.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
|
||
|
"\n",
|
||
|
"/* Common style for \"i\" and \"?\" */\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link,\n",
|
||
|
"a:link.sk-estimator-doc-link,\n",
|
||
|
"a:visited.sk-estimator-doc-link {\n",
|
||
|
" float: right;\n",
|
||
|
" font-size: smaller;\n",
|
||
|
" line-height: 1em;\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" border-radius: 1em;\n",
|
||
|
" height: 1em;\n",
|
||
|
" width: 1em;\n",
|
||
|
" text-decoration: none !important;\n",
|
||
|
" margin-left: 0.5em;\n",
|
||
|
" text-align: center;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link.fitted,\n",
|
||
|
"a:link.sk-estimator-doc-link.fitted,\n",
|
||
|
"a:visited.sk-estimator-doc-link.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover */\n",
|
||
|
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
|
||
|
".sk-estimator-doc-link:hover,\n",
|
||
|
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
|
||
|
".sk-estimator-doc-link:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
|
||
|
".sk-estimator-doc-link.fitted:hover,\n",
|
||
|
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
|
||
|
".sk-estimator-doc-link.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Span, style for the box shown on hovering the info icon */\n",
|
||
|
".sk-estimator-doc-link span {\n",
|
||
|
" display: none;\n",
|
||
|
" z-index: 9999;\n",
|
||
|
" position: relative;\n",
|
||
|
" font-weight: normal;\n",
|
||
|
" right: .2ex;\n",
|
||
|
" padding: .5ex;\n",
|
||
|
" margin: .5ex;\n",
|
||
|
" width: min-content;\n",
|
||
|
" min-width: 20ex;\n",
|
||
|
" max-width: 50ex;\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" box-shadow: 2pt 2pt 4pt #999;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link.fitted span {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background: var(--sklearn-color-fitted-level-0);\n",
|
||
|
" border: var(--sklearn-color-fitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link:hover span {\n",
|
||
|
" display: block;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 a.estimator_doc_link {\n",
|
||
|
" float: right;\n",
|
||
|
" font-size: 1rem;\n",
|
||
|
" line-height: 1em;\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" border-radius: 1rem;\n",
|
||
|
" height: 1rem;\n",
|
||
|
" width: 1rem;\n",
|
||
|
" text-decoration: none;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 a.estimator_doc_link.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover */\n",
|
||
|
"#sk-container-id-2 a.estimator_doc_link:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-2 a.estimator_doc_link.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KMeans(init='random', n_clusters=5, n_init=10, random_state=2)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" checked><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow\"><div><div>KMeans</div></div><div><a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.6/modules/generated/sklearn.cluster.KMeans.html\">?<span>Documentation for KMeans</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></div></label><div class=\"sk-toggleable__content fitted\"><pre>KMeans(init='random', n_clusters=5, n_init=10, random_state=2)</pre></div> </div></div></div></div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
"KMeans(init='random', n_clusters=5, n_init=10, random_state=2)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 22,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"kmeans_rnd_10_inits = KMeans(n_clusters=5, init=\"random\", n_init=10,\n",
|
||
|
" random_state=2)\n",
|
||
|
"kmeans_rnd_10_inits.fit(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "4524c4c7-8f7c-49d9-943d-412a24ab77b1",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Recover good clusters."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 23,
|
||
|
"id": "eda66172-ee1c-439c-b602-c5a9de875e28",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.439340Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.439172Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.711937Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.711261Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAq0AAAFzCAYAAAAUgDBpAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAj99JREFUeJztnXl4E+X2x7+TdE3TdIGyl02vgksLcpFVBKGigoJSVK6KFVSgrVcBsdAiKFJKBUSlLC5gBQGRsrjwU0AFFzZlRwS9ylahhUKXkHRP8vujzJCkWSbJZGaSnM/z9DHNzLzvmUnUb7/vOedlTCaTCQRBEARBEAQhYxRSB0AQBEEQBEEQziDRShAEQRAEQcgeEq0EQRAEQRCE7CHRShAEQRAEQcgeEq0EQRAEQRCE7CHRShAEQRAEQcgeEq0EQRAEQRCE7AmSOgBvYTQaceHCBURGRoJhGKnDIQiCIAiCIKwwmUy4evUqWrVqBYXCsZfqt6L1woULiI+PlzoMgiAIgiAIwgmFhYVo06aNw3P8VrRGRkYCAApPvA9NpEriaAjCO7S48RlUVdchNDwMK/dvlTocIkDZt+0HvDVxBgDgnXfeQWJiosQREQThK+j1egwZMoTTbY7wW9HKpgRoIlXQaEi0Ev6HVqtHVXUdAGD22mVQRaoljogIVH7e8i2Ahv/u9unTR+JoCILwRfikclIhFkH4KMNH5XKvb+pyq4SREEQDISEhUodAEIQfQ6KVIHyQ+noDdvz0GwDgzkH9nCavEwRBEISvQ/+nIwgf5Oe9v3OvJy+aLWEkBEEQBCEOJFoJwgd5YEQ2ACBUFQ5VRITE0RAEQRCE9yHRShA+xt+nLqCqqhYAMGvVYomjIQiCIAhxINFKED7Gixkfca87/ztBwkgIgiAIQjxItBKEj7Fl6wEAQELv7rTbG0EQBBEwkGglCB/io0++415nf7pMwkgIgiAIQlxItBKED/HfV5YDAKKaxkocCUEQBEGIC4lWgvAR/vq7CDpdNQDguRmTJY6GIAiigYKCAgwdOhQFBQVSh0L4OSRaCcJHGPfStXQABrh7+H3SBkMQBHGN/Px8FBcXIz8/X+pQCD+HRCtB+Ajf/3AMADAw+SGJIyEIgrhOSkoKWrRogZSUFKlDIfycIKkDIAjCOa/N+ZR7PW7WFAkjIQiCsCQ5ORnJyclSh0EEAOS0EoQPMP/dzwEAsc3jEB6hkjgagiAI34bycH0TEq0EIXN0uiroK2sAAKlzMiWOhiAIwvehPFzfhEQrQcicnvdkcK97JPWTMBKCIAj/gPJwfRMSrX7KsuVb0f7WcVi2fKvUoRAeYDQacfzkPwCAYc/9R+JoCIIg/IPk5GR89dVXlIvrY5Bo9VPmvrURZwtLMPetjVKHQnjA4ve/5l6PTB0jYSQEQRCOoTxRwtuQaPVTpk56BO3i4zB10iNSh0J4wOSsfABAVNMYRDWJkTYYgrDDpX8uAAAMBoPEkRBSQnmihLch0eqnjB87GGeOv4fxYwdLHQrhJleuaFFX1yACcjeukDgagrDPhTOFAID6+nqJIyGkhPJECW9DopUgZMrdD7zKvW7dvq2EkRCEY1q1jwcABAVR629v4CvL7pQnSngbEq0EIUOqq2tx/ESDe/XA6JESR0MQjmnWphUAQKlUShyJf0LL7gTRAIlWgpAhn23cxb0e++okCSMhCEJqaNmdkAq5ufwkWglChoxNXwIAiIyNRkhoiMTREAQhJbTsTkiF3Fx+Eq0EITOOHT+D+vqGAqyFX66yOPb1qgKM6TUUX6+Sx1+9BEEQhP8iN5efRCtByIxnr7msjIJB8/hWFsfWL8lHyfkirF+SL0FkBOEcuS0nEgThPnJz+Um0BiC0W5Z8MRgM+OXAXwCAvg8mNTo+MjUFca1bYmRqisiREQQ/5LacSBCE/0Ci1QcQWmTSblny5e3FX3GvX1mU0+j4/U8lY8Wer3D/U/L4q5cgrJHbciJBEP4DiVYfQGiRSbtlyZesWasBAC3atpE4EoJwD7ktJxIE4T+QaJUZtlxVoUUm7ZYlT/b8chI1tQ07Cr0wb7rE0RAEQRCEvCDRKjNsuapiikzKd5WO8S++BwBQBCmQ0Ku7xNEQBH8unDkrdQgEQQQAoojWpUuXIiEhARqNBhqNBr169cLXX3/t8Jr169ejU6dOCAsLw+23347/+7//EyNUyZF66Z7yXaXj6PGG//EPf+4piSMhCNf456+G72779u2lDYQgCL9GFNHapk0bzJ07FwcOHMD+/ftxzz33YNiwYTh+/LjN83fv3o1Ro0Zh7NixOHToEIYPH47hw4fjt99+EyNc0TF3N6VeupdaNAcq415cxr0e/UqahJEQhPusXr1a6hAIgvBjGJPJZJJi4tjYWMybNw9jx45tdOyxxx6DXq/HV19dr6Tu2bMnunTpgmXLljU63xZarRZRUVGo+OcTaDQqweL2Bu1vHYezhSVoFx+HM8ffkzocQgJCmzyK2rp6tPlXByz9jvpbesLXqwqwfkk+RqamUJcFkXiwbTcAwP79+yWOhPCEgoIC5OfnIyUlhQrpCNHQ6XTo378/KioqoNFoHJ4rek6rwWDAp59+Cr1ej169etk8Z8+ePRg0aJDFe4MHD8aePXvsjltTUwOtVmvx4yuQuxnYnD13CbV1DQVYqdnTJI7G96ENGAi5IveNF6jHLiF3RBOtx44dg1qtRmhoKMaPH49NmzbhlltusXlucXExmjdvbvFe8+bNUVxcbHf8nJwcREVFcT/x8fGCxs8XdwqZ7KUEOBuLiqb8g7vua+gUwCgVuL1nN4mj8X1oAwZCrshdFJr32JW7wCYCE9FE680334zDhw9j3759mDBhAp5++mn8/vvvgo0/bdo0VFRUcD+FhYWCje0KQhYyZc1ajbOFJVzvTmuRyncuT8UtiWPvUVdXj8J/LgMAnp0+UeJo/APagIGQK3LfeMG8x67cBTYfSHj7H6KJ1pCQENx4443o1q0bcnJykJiYiHfeecfmuS1atMDFixct3rt48SJatGhhd/zQ0FCuOwH7IwWeLPU7E4esSE1/+QMsW76Vm6t3j068rnNXSFNHAe8xdcYq7nXSqOHSBUIQhNeRYuMFd4Wb3AU2H/xBeBOWSNan1Wg0oqamxuaxXr164bvvvrN4b/v27XZzYOUE3+p/WwKVFYdZs1aj/a3jMHhgV7SLj0P2jCcANAhipVIBg8GIuW9t5Obave8kJyrtbU4QG6PGVV2VW6kGlHPrPd5ZtgUA0KpDPMJV8i4YlJKvVxVgTK+h+HoVOSaE6wSy4+aucPOHnc38QXgTlogiWqdNm4Yff/wRZ86cwbFjxzBt2jTs3LkTTzzRIMZGjx6NadOuF6C8+OKL+Oabb7BgwQKcPHkSr732Gvbv34/09HQxwhUFW+4lKw4B4GxhCXbvO8kJ4GXLt2LuWxvx6MN9LATksuVbcVVXhdgYNaZOesTu5gSR6nCUlunsuqWO3FSp23D5KxcvlcFgMAIAstfw64oRqFBxFeEJ/ui48RXigSzc/EF4E5aIIlovXbqE0aNH4+abb8bAgQPx66+/YuvWrUhKSgIAnDt3DkVFRdz5vXv3xpo1a/D+++8jMTERBQUF2Lx5M2677TYxwhUFW+4lKw6zZzzR6BgrKs2FLNCQ91papuOut+eK9u7RCUqlAr17dOIdD+Fdut01BQCgYBg0bW0/9cVT/MGlpOIqwhP8UbjxFeIk3Ah/QrI+rd7Gl/q08oF1WqdOesTC8WzSbjRKy3RQqUIR10TDic65b21E7x6dsHvfSQsHNjZGjUh1eKNxCHHRavWIatOw89XY6S9h+PONd8ESqt/omF5DUXK+CHGtW2LFnq+cX0AQLkJ9WsWHeqoS/oKs+7QSwsK6smGhwdzyPitQP9u0iyvc6t2jE2Jj1Cgv11NRlQx4d9n1bYmHjhll8xyhlsTJpSQI/4McVCIQIdEqEa72YLWVc2ruvpqnFLCpAF0TOnKFW7v3nQQAGE0mKBiG0gAkZuacTwEAcW1aIigoyOY5QolNagFFEPInkIvFCIIvJFolwlkbKevjtnJ
|
||
|
"text/plain": [
|
||
|
"<Figure size 800x400 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(8, 4))\n",
|
||
|
"plot_decision_boundaries(kmeans_rnd_10_inits, X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "c842d05a-f644-4eaa-ad76-020ca86928cf",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Extensions\n",
|
||
|
"\n",
|
||
|
"- Acceleration\n",
|
||
|
"- Mini-batch\n",
|
||
|
"- Selecting number of clusters"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "7a63242f-1e3c-44c8-b2a9-6894639579b0",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"#### Limitations\n",
|
||
|
"\n",
|
||
|
"- Need to run several times with different initialisations\n",
|
||
|
"- Need to specify number of clusters\n",
|
||
|
"- Does not behave well when clusters have varying sizes, different densities, or non-spherical shapes."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "5f0a0e6d-0d88-488e-b524-0d489ca7cbce",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"## Density estimation with Gaussian mixture models"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "a75ccc62-75ea-4731-a770-d435582217be",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Aim of density estimation is to estimate a *probability distribution* from which data were drawn. \n",
|
||
|
"\n",
|
||
|
"Can then generate new data instances (generative AI) and also perform probability density estimation."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "5698c35f-e9dd-411e-a27c-7ad557863b14",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"A *Gaussian mixture model* (GMM) is a probabilistic model that assumes that the instances were generated from a mixture of a number of Gaussian distributions whose parameters are unknown."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "d99d4e4d-7c89-4b0b-82a3-01a89421fa7c",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Gaussian mixture model description"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "95422c17-1259-4fab-9ec3-61c9128272d2",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Instances generated from a Gaussian distribution form clusters that look like ellipsoids. Each cluster can have a different ellipoidal shape, size, density, and orientation."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "a15daff3-3eca-446e-b8a1-c21749f34d8e",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Scikit-Learn contains several GMM variants. In the simplist, the `GaussianMixture` class, you must know in advance the number of Gaussian distributions $k$."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "724d96ff-26bb-42dc-879a-9841813029c1",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"The dataset $\\mathbf{X}$ is assumed to have been generated through the following probabilistic process:\n",
|
||
|
"\n",
|
||
|
"- For each instance, a cluster is picked randomly from among $k$ clusters. The probability of choosing the $j^{\\text{th}}$ cluster is defined by the clusters weight, $\\phi^{(j)}$. The index of the cluster chosen for the $i^{\\text{th}}$ instance is noted $z^{(i)}$.\n",
|
||
|
"- If $z^{(i)}=j$, meaning the $i^{\\text{th}}$ instance has been assigned to the $j^{\\text{th}}$ cluster, the location $\\mathbf{x}^{(i)}$ of this instance is sampled randomly from the Gaussian distribution with mean $\\mathbf{\\mu}^{(j)}$ and covariance matrix ${\\sum}^{(j)}$. This is noted as:\n",
|
||
|
"\n",
|
||
|
"$$\\mathbf{x}^{(i)} \\sim \\mathcal{N} \\left( \\mathbf{\\mu}^{(j)}, {\\sum}^{(j)} \\right)$$"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "28acc2bd-d0f4-48f1-95f9-25ac4b28fea7",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Let's generate some random blobs to use as a dataset again."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 24,
|
||
|
"id": "684e1980-71eb-4748-a70c-f7cffcd914d5",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.714311Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.714134Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.719858Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.719306Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"X1, y1 = make_blobs(n_samples=1000, centers=((4, -4), (0, 0)), random_state=42)\n",
|
||
|
"X1 = X1.dot(np.array([[0.374, 0.95], [0.732, 0.598]]))\n",
|
||
|
"X2, y2 = make_blobs(n_samples=250, centers=1, random_state=42)\n",
|
||
|
"X2 = X2 + [6, -8]\n",
|
||
|
"X = np.r_[X1, X2]\n",
|
||
|
"y = np.r_[y1, y2]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 25,
|
||
|
"id": "29b9362a-9ae4-49d9-93b9-7483a6d8dbbc",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.721625Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.721457Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.823411Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.822731Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAqsAAAFzCAYAAAAZnkAuAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAQFVJREFUeJzt3X+QVfV9//H33asgN8LKJfiDgIvYVhMrQVxxxCZCYaLRTtQZY7uljVoHxYE01jTNRSeSdoxsJsY6xfaupg2mrRZtjU36Q6s1gTghGkVo/IUJKrCCP6jUZbHD2tn9fP/we27OHs6Pz/n9Oec8HzM7yu6953zOj/s5r/M+n3NuTSmlBAAAADBQV94NAAAAALwQVgEAAGAswioAAACMRVgFAACAsQirAAAAMBZhFQAAAMYirAIAAMBYR+TdgDSMjY3J3r17ZfLkyVKr1fJuDgAAAByUUjI8PCwzZsyQri7v+mkpw+revXtl1qxZeTcDAAAAAQYHB2XmzJmefy9lWJ08ebKIfLDwU6ZMybk1AAAAcDpw4IDMmjWrk9u8lDKsWpf+p0yZQlgFAAAwWNCQTW6wAgAAgLEIqwAAADAWYRUAAADGIqwCAADAWIRVAAAAGIuwCgAAAGMRVgEAAGAswioAAACMZVxY/epXvyq1Wm3cz6mnnpp3swAAAJADI7/B6rTTTpP//M//7Pz7iCOMbCYAAABSZlxlVeSDcHr88cd3fj784Q/7vn5kZEQOHDgw7gdAdQ0MDMjs2bNlYGAg76YAAGIyMqz+4he/kBkzZsicOXNk2bJlsnv3bt/Xr127Vrq7uzs/s2bNyqilAEzU398vu3btkv7+/rybAgCIybiwevbZZ8s999wjjzzyiLTbbXnttdfkE5/4hAwPD3u+Z/Xq1TI0NNT5GRwczLDFAEzTarWkp6dHWq1W3k0BAMRUU0qpvBvh591335Wenh65/fbb5eqrr9Z6z4EDB6S7u1uGhoZkypQpKbcQAAAAYenmNeMqq07HHHOM/Nqv/Zrs2LEj76YAAAAgY8aH1YMHD8orr7wiJ5xwQt5NAQDAeNxgiLIxLqz+8R//sWzatEl27twpmzdvlksvvVTq9br09fXl3TQAAIzHDYYoG+PC6uuvvy59fX1yyimnyOWXXy7Tpk2TJ598UqZPn5530wAAMB43GKJsjL/BKgpusAIAADBbaW6wAgAAQHURVgEAAGAswioAAACMRVgFAACAsQirAAAAMBZhFQAAAMYirAIAAMBYhFUAAAAYi7AKAAAAYxFWAQAAYCzCKgAAAIxFWAUAAICxCKsASm9gYEBmz54tAwMDeTcFABBSTSml8m5E0g4cOCDd3d0yNDQkU6ZMybs5AHI2e/Zs2bVrl/T09MjOnTvzbg4AQPTzGpVVAKXXarWkp6dHWq1W3k0BAIREZRUAAACZo7IKAACAwiOsAgAAwFiEVQAAABiLsAoAAABjEVYBoKB4fiyAKuBpAABQUDw/FkCR8TQAACg5nh8LoAqorAIAACBzpams9vf3S61Wk+uvvz7vpgAAACBjRofVp59+Wu666y6ZO3du3k0BAABADowNqwcPHpRly5bJt771LZk6dWrezQGQIO5iBwDoMjasrly5Ui666CJZunRp4GtHRkbkwIED434AmKu/v1927dol/f39rn8nzAIALEaG1Q0bNsizzz4ra9eu1Xr92rVrpbu7u/Mza9aslFsIII6gu9iDwiwAoDqMC6uDg4PyhS98Qe6991456qijtN6zevVqGRoa6vwMDg6m3EoAcaxYsUJ27twpK1ascP17mEcyUYUFgHIz7tFV//zP/yyXXnqp1Ov1zu9GR0elVqtJV1eXjIyMjPubGx5dBVQHD8YHgGIq7KOrlixZIs8995xs27at89Pb2yvLli2Tbdu2BQZVAPrKUJXkwfgAUG7GVVbdLFq0SObNmyd33HGH1uuprAJ6qEoCAPJS2MoqgOzEqUpGqcqWoZILAMhWIcLqxo0btauqALIR5Y597vIHAIRViLAKIB6vimac8BilKsv4UgBAWIUYsxoWY1aB8bzGpg4MDEh/f7+0Wi3Px0gBAJAGxqwC6PCqaAY97xTABxhvDeSHyioAAAF4cgaQPCqrAAAkhPHWQH6orAIAACBzVFYBZI5xfQCApFFZBZAYxvUBAHRRWQWQOcb1AQCSRlgFEInbJX8ehQUASBphFTCUCeM//drAV6cCALJAWAUMlWUYjPJ1rFzyB8Iz4SQUKBrCKmCoLMOgVyj1a0NWl/w5uKNMuCIBhMfTAADIwMCA9Pf3S6vVMm68KU8YQJmY/FkDssbTAICSyKKyaPKNUQw3QJmY/FkDTEVYBQyX52VDe1C2/v93f/d3M70sr3NwZ6gAAJQXYRUwXNKVxTDBzh6Urf9/4IEHUg/PYcMn4wABoLwIq4DholYWo9zh72QPytb/X3755alflg9qo3PZGCoAAOXFDVZACbjdhOR1Y1IRbvDwa+PAwICsWrVKRkdHuekKAAqMG6yACnGrLHpVG4twg4fVRhE5rDrc398vo6OjUq/XqaQCQAVQWQVgLLfqcBEqw4BJ+MzAVFRWAXgqyt3zbtXhIlSGAZNwAyKKjrAKVJBJBy+/4KwbTIsSvoE8cAMiio5hAEAFxb0smORlxSS+oYpvuQKA4mEYAABPK1askFarJf39/ZGqkUlWZpOo+lA5qgYq6EA1GVdZbbfb0m63O9WR0047TW6++Wb59Kc/rT0NKqtAsDDVSGcllRs2kAcq6EC5FLayOnPmTOnv75ctW7bIM888I7/5m78pF198sbzwwgt5Nw0olTDVSGcl1V6ZzfrrV1FdVNCBajKusuqm2WzKN77xDbn66qu1Xk9lFUiWWyXVqnLV63Ue0A8ACK2wlVW70dFR2bBhg7z33ntyzjnneL5uZGREDhw4MO4HyMvAwIBMmzZNpk2bZny1UXcMoNtd+a1WS5rNpkycOFGazSbVLnQwthRAkoysrD733HNyzjnnyKFDh+Too4+W++67Ty688ELP13/1q1+VP/3TPz3s91RWkQer4igiqVQbTboTnzGEcMN+AUBHoSurp5xyimzbtk2eeuopue666+SKK66QF1980fP1q1evlqGhoc7P4OBghq0FxrMqjmlVG5O6E39gYECGh4fHtTNsRcw5hlDn/V6vCTNvKndmY2xpefHZQy5UASxZskRdc8012q8fGhpSIqKGhoZSbBWQj3a7rZrNpmo2m6rdbkeeTk9PjxIR1dPT4/u7uNPUfU2YecdtJwB/7XZb9fT0HNbH8NlDknTzmpGVVaexsTEZGRnJuxlAbuzVjBUrVsjkyZNl//79saqrbtWvuBUxnfd7vSbMvKncQUcZq4BZLZPXFRw+e8hFRuFZW6vVUps2bVKvvfaa+tnPfqZarZaq1Wrq0Ucf1Z4GlVUUnbOq4axmeFU98mhbVu+FeUzfnmWsAma1TKZvW5SDbl4zLqz+wR/8gerp6VETJkxQ06dPV0uWLAkVVJUirKI4dC+1mXLgaLfbql6vR25bGcNDlZm+PU353CSpjMuE6ipsWE0CYRVF4XWwN/WAZLW3Xq97Vn39pLFcpq6rKmDdA4hDN68Z+eiquPhSABRF0b621GrvwoULZfPmzZ1xa3kuA49JApCFovXXRaCb1wirAEIzKSByAAGQBZP6vbIo9HNWAcSX5l3DJt0R7PbtWvBXxrvkgbSZ1O9VDWEVKCn7o2cIJ9Wgu52T+mKJpNsFmIwT4/wQVoEcRPkWp7DvsVcBkg4nWYcd6NHdLmEqREkETfYXALFkcLNX5ngaAPKkc4d0lG9xivPNT0Ft6uvrU/V6XfX19XlOI8z0kI80tksSj6difwHghkdXEVaRA7fnkHq9zu3g7XdQj/KeoHlb/1+r1TqPpAo7DZQb2xpAWgirhFXkwO05pGHoBIO44cFeKbP+v9FoBFZW7fM1/WHwKL4yhOQyLAOQJsIqYRU5SDJIhn2N7rzdKqthv32KgzDSVoYTorSWIejzF/fvQFYIq4RVhJTXd96HDY+6X9GaNA5wyFIZ9re0liHosx7370BWCKu
|
||
|
"text/plain": [
|
||
|
"<Figure size 800x400 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(8, 4))\n",
|
||
|
"plt.plot(X[:, 0], X[:, 1], 'k.', markersize=2)\n",
|
||
|
"plt.xlabel(\"$x_1$\")\n",
|
||
|
"plt.ylabel(\"$x_2$\", rotation=0);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "5ef1235d-28d2-4fdd-834a-52d948f3111d",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Training\n",
|
||
|
"\n",
|
||
|
"Let's train a Gaussian mixture model. \n",
|
||
|
"\n",
|
||
|
"The model is trainined by the *expectation-maximisation* (EM) algorithm. We won't go into technical details but has similarites with the K-means algorithm. It initialises clusters randomly, then repeats an *expectation* step (cf. assign instances to clusters) and a *maximisation* step (cf. updating clusters)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 26,
|
||
|
"id": "21208a04-d9ff-431d-ad74-843a079b2100",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.825362Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.825184Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.830771Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.830186Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.mixture import GaussianMixture"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 27,
|
||
|
"id": "effa4aaf-6129-4226-b59a-a97b8d32a4ef",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.832627Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.832316Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.902677Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.902140Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<style>#sk-container-id-3 {\n",
|
||
|
" /* Definition of color scheme common for light and dark mode */\n",
|
||
|
" --sklearn-color-text: #000;\n",
|
||
|
" --sklearn-color-text-muted: #666;\n",
|
||
|
" --sklearn-color-line: gray;\n",
|
||
|
" /* Definition of color scheme for unfitted estimators */\n",
|
||
|
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
|
||
|
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
|
||
|
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
|
||
|
" --sklearn-color-unfitted-level-3: chocolate;\n",
|
||
|
" /* Definition of color scheme for fitted estimators */\n",
|
||
|
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
|
||
|
" --sklearn-color-fitted-level-1: #d4ebff;\n",
|
||
|
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
|
||
|
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
|
||
|
"\n",
|
||
|
" /* Specific color for light theme */\n",
|
||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
|
||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
||
|
" --sklearn-color-icon: #696969;\n",
|
||
|
"\n",
|
||
|
" @media (prefers-color-scheme: dark) {\n",
|
||
|
" /* Redefinition of color scheme for dark theme */\n",
|
||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
|
||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
||
|
" --sklearn-color-icon: #878787;\n",
|
||
|
" }\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 pre {\n",
|
||
|
" padding: 0;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 input.sk-hidden--visually {\n",
|
||
|
" border: 0;\n",
|
||
|
" clip: rect(1px 1px 1px 1px);\n",
|
||
|
" clip: rect(1px, 1px, 1px, 1px);\n",
|
||
|
" height: 1px;\n",
|
||
|
" margin: -1px;\n",
|
||
|
" overflow: hidden;\n",
|
||
|
" padding: 0;\n",
|
||
|
" position: absolute;\n",
|
||
|
" width: 1px;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-dashed-wrapped {\n",
|
||
|
" border: 1px dashed var(--sklearn-color-line);\n",
|
||
|
" margin: 0 0.4em 0.5em 0.4em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" padding-bottom: 0.4em;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-container {\n",
|
||
|
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
|
||
|
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
|
||
|
" so we also need the `!important` here to be able to override the\n",
|
||
|
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
|
||
|
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
|
||
|
" display: inline-block !important;\n",
|
||
|
" position: relative;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-text-repr-fallback {\n",
|
||
|
" display: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"div.sk-parallel-item,\n",
|
||
|
"div.sk-serial,\n",
|
||
|
"div.sk-item {\n",
|
||
|
" /* draw centered vertical line to link estimators */\n",
|
||
|
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
|
||
|
" background-size: 2px 100%;\n",
|
||
|
" background-repeat: no-repeat;\n",
|
||
|
" background-position: center center;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Parallel-specific style estimator block */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-parallel-item::after {\n",
|
||
|
" content: \"\";\n",
|
||
|
" width: 100%;\n",
|
||
|
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
|
||
|
" flex-grow: 1;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-parallel {\n",
|
||
|
" display: flex;\n",
|
||
|
" align-items: stretch;\n",
|
||
|
" justify-content: center;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" position: relative;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-parallel-item {\n",
|
||
|
" display: flex;\n",
|
||
|
" flex-direction: column;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-parallel-item:first-child::after {\n",
|
||
|
" align-self: flex-end;\n",
|
||
|
" width: 50%;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-parallel-item:last-child::after {\n",
|
||
|
" align-self: flex-start;\n",
|
||
|
" width: 50%;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-parallel-item:only-child::after {\n",
|
||
|
" width: 0;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Serial-specific style estimator block */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-serial {\n",
|
||
|
" display: flex;\n",
|
||
|
" flex-direction: column;\n",
|
||
|
" align-items: center;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" padding-right: 1em;\n",
|
||
|
" padding-left: 1em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
|
||
|
"clickable and can be expanded/collapsed.\n",
|
||
|
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
|
||
|
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
|
||
|
"*/\n",
|
||
|
"\n",
|
||
|
"/* Pipeline and ColumnTransformer style (default) */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-toggleable {\n",
|
||
|
" /* Default theme specific background. It is overwritten whether we have a\n",
|
||
|
" specific estimator or a Pipeline/ColumnTransformer */\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Toggleable label */\n",
|
||
|
"#sk-container-id-3 label.sk-toggleable__label {\n",
|
||
|
" cursor: pointer;\n",
|
||
|
" display: flex;\n",
|
||
|
" width: 100%;\n",
|
||
|
" margin-bottom: 0;\n",
|
||
|
" padding: 0.5em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" text-align: center;\n",
|
||
|
" align-items: start;\n",
|
||
|
" justify-content: space-between;\n",
|
||
|
" gap: 0.5em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 label.sk-toggleable__label .caption {\n",
|
||
|
" font-size: 0.6rem;\n",
|
||
|
" font-weight: lighter;\n",
|
||
|
" color: var(--sklearn-color-text-muted);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 label.sk-toggleable__label-arrow:before {\n",
|
||
|
" /* Arrow on the left of the label */\n",
|
||
|
" content: \"▸\";\n",
|
||
|
" float: left;\n",
|
||
|
" margin-right: 0.25em;\n",
|
||
|
" color: var(--sklearn-color-icon);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 label.sk-toggleable__label-arrow:hover:before {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Toggleable content - dropdown */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-toggleable__content {\n",
|
||
|
" max-height: 0;\n",
|
||
|
" max-width: 0;\n",
|
||
|
" overflow: hidden;\n",
|
||
|
" text-align: left;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-toggleable__content.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-toggleable__content pre {\n",
|
||
|
" margin: 0.2em;\n",
|
||
|
" border-radius: 0.25em;\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-toggleable__content.fitted pre {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
|
||
|
" /* Expand drop-down */\n",
|
||
|
" max-height: 200px;\n",
|
||
|
" max-width: 100%;\n",
|
||
|
" overflow: auto;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
|
||
|
" content: \"▾\";\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Pipeline/ColumnTransformer-specific style */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator-specific style */\n",
|
||
|
"\n",
|
||
|
"/* Colorize estimator box */\n",
|
||
|
"#sk-container-id-3 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-label label.sk-toggleable__label,\n",
|
||
|
"#sk-container-id-3 div.sk-label label {\n",
|
||
|
" /* The background is the default theme color */\n",
|
||
|
" color: var(--sklearn-color-text-on-default-background);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover, darken the color of the background */\n",
|
||
|
"#sk-container-id-3 div.sk-label:hover label.sk-toggleable__label {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Label box, darken color on hover, fitted */\n",
|
||
|
"#sk-container-id-3 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator label */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-label label {\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" font-weight: bold;\n",
|
||
|
" display: inline-block;\n",
|
||
|
" line-height: 1.2em;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-label-container {\n",
|
||
|
" text-align: center;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Estimator-specific */\n",
|
||
|
"#sk-container-id-3 div.sk-estimator {\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" border: 1px dotted var(--sklearn-color-border-box);\n",
|
||
|
" border-radius: 0.25em;\n",
|
||
|
" box-sizing: border-box;\n",
|
||
|
" margin-bottom: 0.5em;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-estimator.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* on hover */\n",
|
||
|
"#sk-container-id-3 div.sk-estimator:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 div.sk-estimator.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
|
||
|
"\n",
|
||
|
"/* Common style for \"i\" and \"?\" */\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link,\n",
|
||
|
"a:link.sk-estimator-doc-link,\n",
|
||
|
"a:visited.sk-estimator-doc-link {\n",
|
||
|
" float: right;\n",
|
||
|
" font-size: smaller;\n",
|
||
|
" line-height: 1em;\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" border-radius: 1em;\n",
|
||
|
" height: 1em;\n",
|
||
|
" width: 1em;\n",
|
||
|
" text-decoration: none !important;\n",
|
||
|
" margin-left: 0.5em;\n",
|
||
|
" text-align: center;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link.fitted,\n",
|
||
|
"a:link.sk-estimator-doc-link.fitted,\n",
|
||
|
"a:visited.sk-estimator-doc-link.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover */\n",
|
||
|
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
|
||
|
".sk-estimator-doc-link:hover,\n",
|
||
|
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
|
||
|
".sk-estimator-doc-link:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
|
||
|
".sk-estimator-doc-link.fitted:hover,\n",
|
||
|
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
|
||
|
".sk-estimator-doc-link.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* Span, style for the box shown on hovering the info icon */\n",
|
||
|
".sk-estimator-doc-link span {\n",
|
||
|
" display: none;\n",
|
||
|
" z-index: 9999;\n",
|
||
|
" position: relative;\n",
|
||
|
" font-weight: normal;\n",
|
||
|
" right: .2ex;\n",
|
||
|
" padding: .5ex;\n",
|
||
|
" margin: .5ex;\n",
|
||
|
" width: min-content;\n",
|
||
|
" min-width: 20ex;\n",
|
||
|
" max-width: 50ex;\n",
|
||
|
" color: var(--sklearn-color-text);\n",
|
||
|
" box-shadow: 2pt 2pt 4pt #999;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background: var(--sklearn-color-unfitted-level-0);\n",
|
||
|
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link.fitted span {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background: var(--sklearn-color-fitted-level-0);\n",
|
||
|
" border: var(--sklearn-color-fitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
".sk-estimator-doc-link:hover span {\n",
|
||
|
" display: block;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 a.estimator_doc_link {\n",
|
||
|
" float: right;\n",
|
||
|
" font-size: 1rem;\n",
|
||
|
" line-height: 1em;\n",
|
||
|
" font-family: monospace;\n",
|
||
|
" background-color: var(--sklearn-color-background);\n",
|
||
|
" border-radius: 1rem;\n",
|
||
|
" height: 1rem;\n",
|
||
|
" width: 1rem;\n",
|
||
|
" text-decoration: none;\n",
|
||
|
" /* unfitted */\n",
|
||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 a.estimator_doc_link.fitted {\n",
|
||
|
" /* fitted */\n",
|
||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"/* On hover */\n",
|
||
|
"#sk-container-id-3 a.estimator_doc_link:hover {\n",
|
||
|
" /* unfitted */\n",
|
||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
||
|
" color: var(--sklearn-color-background);\n",
|
||
|
" text-decoration: none;\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"#sk-container-id-3 a.estimator_doc_link.fitted:hover {\n",
|
||
|
" /* fitted */\n",
|
||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
||
|
"}\n",
|
||
|
"</style><div id=\"sk-container-id-3\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>GaussianMixture(n_components=3, n_init=10, random_state=42)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-3\" type=\"checkbox\" checked><label for=\"sk-estimator-id-3\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow\"><div><div>GaussianMixture</div></div><div><a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.6/modules/generated/sklearn.mixture.GaussianMixture.html\">?<span>Documentation for GaussianMixture</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></div></label><div class=\"sk-toggleable__content fitted\"><pre>GaussianMixture(n_components=3, n_init=10, random_state=42)</pre></div> </div></div></div></div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
"GaussianMixture(n_components=3, n_init=10, random_state=42)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 27,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"gm = GaussianMixture(n_components=3, n_init=10, random_state=42)\n",
|
||
|
"gm.fit(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "91965fbb-57e5-4693-ae1f-76fb4a5099dc",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Let's look at the parameters of the GMM estimated:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 28,
|
||
|
"id": "b1397df2-f564-431d-8999-d0335f9b1bfc",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.904988Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.904805Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.909338Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.908881Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([0.40005972, 0.20961444, 0.39032584])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 28,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"gm.weights_"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 29,
|
||
|
"id": "4e58e66f-59dd-4068-8fb2-4f52f9c43b7c",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.911681Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.911504Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.915938Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.915368Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[-1.40764129, 1.42712848],\n",
|
||
|
" [ 3.39947665, 1.05931088],\n",
|
||
|
" [ 0.05145113, 0.07534576]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 29,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"gm.means_"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 30,
|
||
|
"id": "d3d1a385-4db6-49d6-aa12-0adc52725905",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.918013Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.917721Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.922273Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.921776Z"
|
||
|
},
|
||
|
"scrolled": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[[ 0.63478217, 0.72970097],\n",
|
||
|
" [ 0.72970097, 1.16094925]],\n",
|
||
|
"\n",
|
||
|
" [[ 1.14740131, -0.03271106],\n",
|
||
|
" [-0.03271106, 0.95498333]],\n",
|
||
|
"\n",
|
||
|
" [[ 0.68825143, 0.79617956],\n",
|
||
|
" [ 0.79617956, 1.21242183]]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 30,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"gm.covariances_"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "25d8790e-4879-416b-9255-b152fdbba228",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Predictions"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "806fbeb6-33d9-4c4b-85f4-a8d016832704",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"The model can be used to predict which cluster each instance belongs to (hard clustering) or to estimate probabilities that it came from each cluster. \n",
|
||
|
"\n",
|
||
|
"For this, just use `predict()` method or the `predict_proba()` method:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 31,
|
||
|
"id": "c1514c98-36c0-4c41-8e66-013b74f0ef7d",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.924979Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.924161Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.928952Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.928484Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([2, 2, 0, ..., 1, 1, 1])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 31,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"gm.predict(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 32,
|
||
|
"id": "043027db-4724-4eb2-af6a-fa520086dbe0",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.930889Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.930714Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.935904Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.935097Z"
|
||
|
},
|
||
|
"scrolled": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[0. , 0.023, 0.977],\n",
|
||
|
" [0.001, 0.016, 0.983],\n",
|
||
|
" [1. , 0. , 0. ],\n",
|
||
|
" ...,\n",
|
||
|
" [0. , 1. , 0. ],\n",
|
||
|
" [0. , 1. , 0. ],\n",
|
||
|
" [0. , 1. , 0. ]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 32,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"gm.predict_proba(X).round(3)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "83733bbf-e1a9-41a1-9dee-112c70e82593",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Generative model"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "e024b7b9-e8a1-4b0a-b2d2-bf2a20e121fc",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"It is a generative model, so you can sample new instances from it (and get their labels)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 33,
|
||
|
"id": "ae6987db-a0c1-4dbe-ac1c-550b12d37764",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.937941Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.937765Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.944065Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.943632Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[-2.32491052, 1.04752548],\n",
|
||
|
" [-1.16654983, 1.62795173],\n",
|
||
|
" [ 1.84860618, 2.07374016],\n",
|
||
|
" [ 3.98304484, 1.49869936],\n",
|
||
|
" [ 3.8163406 , 0.53038367],\n",
|
||
|
" [ 0.38079484, -0.56239369]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 33,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"X_new, y_new = gm.sample(6)\n",
|
||
|
"X_new"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 34,
|
||
|
"id": "d41f7c1f-6e4f-4f75-a0d3-970b5a6904b1",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.946209Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.946036Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.951024Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.950557Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([0, 0, 1, 1, 1, 2])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 34,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"y_new"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "be0b4200-83f1-492f-9939-95cee8231fa6",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Probability density estimation"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "298a478e-845a-41b7-a5f0-be616ae62184",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Can estimate the log of the _probability density function_ (PDF) at any location using the `score_samples()` method:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 35,
|
||
|
"id": "f63f5892-3af5-4a40-9b09-16a252ea7b70",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.953173Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.953002Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:23.959035Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:23.958565Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([-2.61, -3.57, -3.33, ..., -3.51, -4.4 , -3.81])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 35,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"gm.score_samples(X).round(2)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "22eeb02b-7f46-4441-973d-50b251e45906",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"A valid probability density must integrate to one over the parameters space. Let's check!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 36,
|
||
|
"id": "137358fd-658c-4a63-a50d-75a6b6cd616f",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:23.961456Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:23.961119Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:25.128194Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:25.127575Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"np.float64(0.9999999999225088)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 36,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"resolution = 100\n",
|
||
|
"grid = np.arange(-10, 10, 1 / resolution)\n",
|
||
|
"xx, yy = np.meshgrid(grid, grid)\n",
|
||
|
"X_full = np.vstack([xx.ravel(), yy.ravel()]).T\n",
|
||
|
"\n",
|
||
|
"pdf = np.exp(gm.score_samples(X_full))\n",
|
||
|
"pdf_probas = pdf * (1 / resolution) ** 2\n",
|
||
|
"pdf_probas.sum()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "1c159cf5-a9e1-4d04-bfd5-6784a15180ec",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"### Visualise"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "578a9285-d07e-4f08-8659-816353416937",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"Let's plot the resulting probability densities and decision boundaries for clustering (dashed lines)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 37,
|
||
|
"id": "fce7a3dc-9e32-4eb4-bb5f-7ebdb21ef90f",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:25.130371Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:25.130180Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:26.007076Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:26.006418Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAqsAAAF1CAYAAADPx6MzAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzsnXV8U3f3x9/xpGlSd0daoMXdnQHbYEyAMeY+5m6/bTzTZ/7M3YWxMRgMHc5wdy0UqHuTJmn0/v7okNIWKrGy+3699mK5ubn3JI187vme8zkSQRAERERERERERERERPwQqa8DEBEREREREREREakPUayKiIiIiIiIiIj4LaJYFRERERERERER8VtEsSoiIiIiIiIiIuK3iGJVRERERERERETEbxHFqoiIiIiIiIiIiN8iilURERERERERERG/RRSrIiIiIiIiIiIifosoVkVERERERERERPwWua8D8AQul4vc3Fx0Oh0SicTX4YiIiIiIiIiIiJyDIAgYjUZiY2ORSuvPn/qdWH3hhReYPn16jW1paWkcOHCgwcfIzc0lISHB3aGJiIiIiIiIiIi4mZMnTxIfH1/v/X4nVgHS09NZunTp6dtyeePC1Ol0ABzflow+sGVUOhw7Yeea2/MoLXPyn/+F0am7ytch/SuJf66U0D8tSG3Vt11S2LUrrsY+FrOL9/9bwfxZZvoOVtHpkYEEhql9EO2FKT1RybK3dnNiSwmtB0YReN0YVFFBvg6rSQiCgOVECaXrDlO24QiWE6VIFDKCOiVAmwzU6W2Rhwb7Oszz4rLZqfhjCaZVmwgf3oGUu4cjkbWM76j6uGbWJu74YX2Nbeu7J/Pcs+N9FJGINxAEAVuhgcrMQsyZBShPHqfwcAWWcjsAqkA5EW31RLYNIry1nsi2esJSApErZT6OXMSfsJrsfDBq8WndVh9+KVblcjnR0dFNfvyppX99oBS9zv8/GBu2Whh/Qx4hwVK+/iOKhGSFr0P611L+Tjjl70Cn9ieROGD3phgCdWfExK6tVp59sITSYhdj/q8LXa5K9stSE7vFwdovDrLhm8PoowNo/9JVhPRq7euwmoQlu5TilfspXnkAy4kSZAFKQvq0QXvZpagzUpGqlL4OscFINWrCbrgKdZsUSr75FcHmJPXpy1usYL3lmzXc+lNNoZofoeM/r070zx8XEbeiCFSjbRUJIzMAiBUEbCWVmDILMR0pQJt7lGMbCtn26zEQQCqXEJasIzI1iKg0PZGpQUSmBfntxb6I97jQ76hffp8cPnyY2NhY1Go1ffv25dVXXyUxMbHe/a1WK1ar9fRtg8HgjTDdwuwFlUydlk+PTiqmfxZKcIj/i+uLCeVBK6lTiykfqSb7lbDT2/etjsURLoV/PkB2u8AX7xn46gMD6Z2VXP7hMEITA30V9nk5vDKPJf/dRWVxFbGT+xI/uQ9SpV9+1OvFVmaqFqjL91F5MB9ZgJLQvm0IvOJy1B3aIlHUfD7GlRswLFiBfuxQdEP6+CjqhqPt1w2JRkXxR99z7JPltJo2wtchNZqbv60tVAvDArnqx7t9FJGIr5FIJKjCdajCdYT2bg30IxRwWmyYjxVVi9hjRZSfPMGhFbnYLU4AtGEqotKCiEoLIjItmKh2QYQmBiKV+V8iQMQ3SARBEHwdxNksXLiQyspK0tLSyMvLY/r06eTk5LBnz55608R11bkClB1q5deZ1Q+/KueBZ4u4+vJAHnojGJVa/GB6k6i3y4n7nxEAAXBpJexfEIXtnMz2iWN2nn2glAN7bPS7oz39b0tFKve/TFhFnpm//ruLQyvyaNUvkqDbx6OJC/F1WA3GZXdStjGTwiW7Kd98DCQQ0rMV0s69UXduj1RZ/4pDzuOv4iwpRxYWTNzrT3kx6uZhXLmBsu9n0/bJy4gY1sHX4TSYp//7J5cu21djW050EBO/u9NHEYm0NASXQFVeOabMQsxHC9HkHKPgYAWGfAsACrWMyLQgotoFEd0umOj2wUS00SNT+N93r0jTsVbaeav/n1RUVKDX6+vdz+/E6rmUl5eTlJTE22+/za233lrnPnVlVhMSEvxWrAqCwNOvlPD6B2Vcd1sgDz4TjFQqClWv4RLoMDQPdZazxmZrpJS9G2Phn7+FIAjMnmHirenlRETJGPZif+I6hfoi4vPicrjY/GMmqz/ejypQQewdowgblOaX5Ql1YTlZQsHCXRT+tRdHhRltajTKHr0J6N0ZWaC2QcdoaZnVUwiCgOXHbynbfJSuX96GMqRhz9eX3Pn5Cm74dXONbblReq75/i4fRSRyMWE3WE6XEZgyCxGyTlJyzAgCyBRSIlP1RHcIIaZDMLEZIYS30vll8kCkYTRUrPr92mBwcDCpqakcOXKk3n1UKhUqVctoSLLbBW5/pIDvfzXy8P8Fc91t5y8qFnEvkioXXdJyOFfGWVrJ2L8s5rRQLSt18tKTZaxcbGHCtVpa3z8MZYD/fVxydpWy8KUdFB2uIHpcdxJuHIBc6/+fBZfNQcnaQxTM34lh10nkOjWa3t3RDuyJMj6m0cfTDenTokTqKSQSCaorrkGy5XVOfrOG1g+N9nVIF2R9z1b033SU1sdLADiaFMb1n9edSBARaSwKvYbgrkkEd006vc1psWE6WojpUAGVh/M5ufUEO2YdQ3BVZ2Cj2gcT1zGE2I6hxHUKQRelaTEX6yINw/9+fc+hsrKSzMxMrr/+el+H0mzMZheT7szjr1VmXvpfKGOu8P8sysWEdqWZ1BtLagnVnIf1FDxwpkN+w5oqnn+4BLsNrnqnN2nDYr0baAOoMtpZ+f5ets08RnT7YDq+dwOBqU1vSvQW1kID+X/uoGDhLhwVZvSdEgi741oCumXUqkP9tyAL1BI4ZjgFvy0g7tq+qKP9261hV5ckHnvpambe9Bnv3jWM2eO7+zokkYscmUaJPj0effoZayOnxUbl4QIqD+ahOX6Y/X/lsPG76qRWYISauE6hxHcOJa5LKNHtg0UXghaO35UBPProo1x++eUkJSWRm5vL888/z44dO9i3bx8RERENOobBYCAoKMivygDKyp2MvyGX7XusvP5ZOH0Hid2P3iT2lXKiPzXW2r53cRTWdtXd5DarwIdvVPDD50Z6D1DR6/mh6CI13g71vAiCwMFluSx5bRc2k4PYGwcRM66bX3eTC4KAYecJ8uZso3TDEWRqBZq+PdEN7YMiJtLX4fkFLquN/CdeJvKSjiTfMdTX4dREEHj87UXMuawzh9L878JNpHnk/7mdnBkbiZvcm+jLuvo6nGZhK62k8kAexv25SI8cIW9vOY4qJzKllJj0EBK6hpHQLYz4LmGodaLrjj/QYssAsrOzufbaaykpKSEiIoIBAwawYcOGBgtVfyS/0MGYa3PIznPw0c8RdOzq/8u0FxumjJpfTC457NwRg6Cr/ghkZdp5+r4SMg/ZGf5wBr2ub4PEz+qIDQUWFr+yg8Mr82k7JBr9bVegiqz/w+1rXDYHRcv3kTd7K+ZjRWiSwgm57gq0fboiVYufgbORqpRoenenaNkOkm4d7FcXH3+Ne4cAq4PLF+8G4MdrevLJ7X4mqEWaTM6MjVgLDeTM2NjixaoyNJDQfm0J7dcWGEycw4kpsxDj3hxURw6wa+5x1n91CCQQlRpEYo/w6v+6h6MJajkWeP9G/E6szpgxw9chuJUT2XZGTcqh0uTik5mRtE4Vr+Z8QcU4LbaXy1HkuzD2UHJkVhRQnfWbM8PEm9PLiYqVccP3Q4luH+zbYM9BcAlsnXmMle/tRRkgJ+3/xhM6INVva7Ls5Wby/9xO/tzt2CvMhPRqTeCVV6Bq39pvY/YHAnp3wbh0LYa9OdWDDnyNILB4/LsEWB0AnJLPy4a0911MIm4nbnLv05nViw2pXIYuLQZdWgzQg1BBoCq3DMPubAy7szm04iibf8w8LV6TeoaT1CuCxO7hqALF32p/wu/E6sXE4aM2Rl6Tg1wOn/wWSXyi+HJ7BUEg+c5itLtt7F1/ZvrU/tmRyE0urG2rs3qGChcvP1XK0vkWrpispc0D/tdEVZRpYMH07eTsLKX
|
||
|
"text/plain": [
|
||
|
"<Figure size 800x400 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"from matplotlib.colors import LogNorm\n",
|
||
|
"\n",
|
||
|
"def plot_gaussian_mixture(clusterer, X, resolution=1000, show_ylabels=True):\n",
|
||
|
" mins = X.min(axis=0) - 0.1\n",
|
||
|
" maxs = X.max(axis=0) + 0.1\n",
|
||
|
" xx, yy = np.meshgrid(np.linspace(mins[0], maxs[0], resolution),\n",
|
||
|
" np.linspace(mins[1], maxs[1], resolution))\n",
|
||
|
" Z = -clusterer.score_samples(np.c_[xx.ravel(), yy.ravel()])\n",
|
||
|
" Z = Z.reshape(xx.shape)\n",
|
||
|
"\n",
|
||
|
" plt.contourf(xx, yy, Z,\n",
|
||
|
" norm=LogNorm(vmin=1.0, vmax=30.0),\n",
|
||
|
" levels=np.logspace(0, 2, 12))\n",
|
||
|
" plt.contour(xx, yy, Z,\n",
|
||
|
" norm=LogNorm(vmin=1.0, vmax=30.0),\n",
|
||
|
" levels=np.logspace(0, 2, 12),\n",
|
||
|
" linewidths=1, colors='k')\n",
|
||
|
"\n",
|
||
|
" Z = clusterer.predict(np.c_[xx.ravel(), yy.ravel()])\n",
|
||
|
" Z = Z.reshape(xx.shape)\n",
|
||
|
" plt.contour(xx, yy, Z,\n",
|
||
|
" linewidths=2, colors='r', linestyles='dashed')\n",
|
||
|
" \n",
|
||
|
" plt.plot(X[:, 0], X[:, 1], 'k.', markersize=2)\n",
|
||
|
" plot_centroids(clusterer.means_, clusterer.weights_)\n",
|
||
|
"\n",
|
||
|
" plt.xlabel(\"$x_1$\")\n",
|
||
|
" if show_ylabels:\n",
|
||
|
" plt.ylabel(\"$x_2$\", rotation=0)\n",
|
||
|
" else:\n",
|
||
|
" plt.tick_params(labelleft=False)\n",
|
||
|
"\n",
|
||
|
"plt.figure(figsize=(8, 4))\n",
|
||
|
"plot_gaussian_mixture(gm, X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "11feb53b-8b19-4390-8d09-8e5ecb565e97",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"## Anomaly detection with Gaussian mixture models"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "3cf47de9-0cc0-4a7e-84a4-4a0f40acde21",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"GMMs can be used for *anomaly detection*.\n",
|
||
|
"\n",
|
||
|
"Instances located in low density regions can be considered anomalies. \n",
|
||
|
"\n",
|
||
|
"Must define what density threshold constitutes an anomaly."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 38,
|
||
|
"id": "12e7c124-913e-42a6-8e3a-23e586284b52",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:26.009077Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:26.008900Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:26.013481Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:26.012915Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": ""
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"densities = gm.score_samples(X)\n",
|
||
|
"density_threshold = np.percentile(densities, 1) # Find threshold corresponding to 1% of instances\n",
|
||
|
"anomalies = X[densities < density_threshold]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 39,
|
||
|
"id": "009242db-3837-403f-ae46-8bcb40c50e9d",
|
||
|
"metadata": {
|
||
|
"editable": true,
|
||
|
"execution": {
|
||
|
"iopub.execute_input": "2025-03-07T05:32:26.015220Z",
|
||
|
"iopub.status.busy": "2025-03-07T05:32:26.015054Z",
|
||
|
"iopub.status.idle": "2025-03-07T05:32:26.907039Z",
|
||
|
"shell.execute_reply": "2025-03-07T05:32:26.906342Z"
|
||
|
},
|
||
|
"slideshow": {
|
||
|
"slide_type": "subslide"
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAqsAAAF1CAYAAADPx6MzAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzsnXV4U2cbh+940lRSdwUKtFB8uDsbA2ZsY8aYG3N3d9dv7oJuY7gNd3eoQN3btHE53x8ZhUILLU3aFM59XVxb3hx5kp6c83uf9xGJIAgCIiIiIiIiIiIiIl6ItKUNEBEREREREREREakPUayKiIiIiIiIiIh4LaJYFRERERERERER8VpEsSoiIiIiIiIiIuK1iGJVRERERERERETEaxHFqoiIiIiIiIiIiNciilURERERERERERGvRRSrIiIiIiIiIiIiXosoVkVERERERERERLwWeUsb4AmcTid5eXn4+fkhkUha2hwREREREREREZFTEASBqqoqoqKikErr9596nVh9/vnneeGFF2qNtW/fngMHDjT4GHl5ecTGxrrbNBERERERERERETeTnZ1NTExMve97nVgFSE1NZenSpTWv5fLGmenn5wfA0W0J+Pu2jkiHnXstTL4tH4VcwkufBpHUTtnSJl2QRL9cTuA/RuRVrtcOOezeEV1rm/JSB28+V866lRbGXeZD0p0DUfkoWsDas5O7u4ylb++h5IietAlxyC8fi9xP3dJmnROCw4l+Tw5l6w5TvjEde6UJuc4HZccO3FxazhOHMlnYtzvPTZnQ0qbWi+B0UrVkDfp5y9B1T6TNI2ORqbzz2mkoj7y/gFH/Hqo1tmBYR969d1QLWSTSXDjMVoyZJRjSCzGkF8GxXEqzqsAJSCAwTktoW3/C2voT1s6f0LYBaENU4oqnSA0Wg42PRy2q0W314ZViVS6XExERcc77H/8h+PtK8feTucssj/HPUgPX3J5Px3ZKXvkqmJAw77f5vEMQQCKh8o1gKt8IpnPnbJBL2LMhCl/ViQnPv0tMvPRYGQBXfNCH5CGRLWXxGTGWW1jx/l52zj1KZGognT+6Ht/23mnrmRCcAlV7cyhesZ/S1YewVxpRhQfg07snk4COPhqkEgl3bN+LH3DFjn2URoQCoPdR8+OQPghnWFpqCXQTR6FqE0/pZz9y5PV5dHz5CqRKr7wVn5WnX/+LsacI1V0do/jw8fHe+XARcStyrQpVsB+BPRNrxpxWO8as/wRsRjHGY0fZ9FM6VoMdAE2gkvDkAMKSAwhv7/pvSJIfMoV3/U5FmpezTWC88n5y+PBhoqKiUKvV9O3bl9dee424uLh6t7dYLFgslprXer2+Ocx0C//7sZK7Hy/ikpFaHv1Ah0Yj/mCbE9/VJtrcWoo5ScbB+SfE3J6N0Qg+J/4WRoOTd1+qYM6vBgaNUNPtyWFog1UtYfIZEZwCO+ceZcX7exCckHTvSMLHdUEia13XlfFYKcVL91K8fB/WIj3KUD80fXri0ysNZUIMvmYr7z32GoEGE07gM4mEt4BHzBYemrMIKVCu1TCrbw8MGu/7O2k6tydk+s2UvP81R95ZQLvHL2l13qYXX5zD8DWHa43tbR/JnR9c10IWiXgDUqUc3+QIfJNPOJxinQKWwkoM6UUYMopQ5mZyaHkem3484tpHLiGkjT/h7QNq/VP7iyuMIi4kgiAILW3EySxYsIDq6mrat29Pfn4+L7zwArm5uezZs6deN3Fdca4A5YeSvNazKggCz75RyqsflHP3zQHc+LQfMlnreli1duLvLSH4LxMAAmAPkbJnbSSCuraw27PDwtPTyygudDD04S50vTzBK4VF0aFKFry8g9ydZXQeH4vquktRBmpb2qwGY682U7JyP0WL9lB9MB+ZrwpNjy749OmOqm08klM8pFGl5Xz0xa/0SD9KInAUiAcyga1t4rnnjmvJD9I1/wdpBMbNuyj5/GcS7xlJ5KXdWtqchiEIfPrAz3TZl1dreGdqNHe9N6WFjBI5ZwSBbruy2Z4WC818X7MbLBgzizFkFGFIL0J27BhFh/U4rE4AdNE+hHfQEdFRR0SKjogOOq90EoicO5ZqG+/0n0dlZSX+/v71bud1YvVUKioqiI+P591332XatGl1blOXZzU2NtZrxarNJnD7I4V8/3sV058I4PrbxaoFzYnE7KRzrzzk+tqXviFNzsG/T3hXHQ6Bbz/R87/39XTopGTAi4MITjhzXE1LYDXaWf35fjb9lE5QvC8Rd11MQFr9KxHehCAI6HfnULRgJ6WrD+G0OwjslYise180XVKQKM68+KOw29l57wv8aLXyOvA4cL1KSdqHz2OXe99vvy4sc36leMleun51M+rwgJY256zc/9Firvx7R62xbZ1jufeda1rGIJEm0WdTBu88PZMHX7mCjb2SWtocBIcT47FSjOlFVKcXojiaReHBSixVNgD8wjVEpuqITA0kKjWQiFQdGtED22ppqFj1yjCAk9HpdCQnJ3PkyJF6t1GpVKhUrWO2ZTA6mXxbPkv+NfLS+0GMm9R6PF/nA/ICG2m9C04brxioJOOn8JrXedl2nnmglF1brfSd1p4Bt3fwypiqw//ms+i1nRjLLMTeMICoKy5CqvB+kWbTmyheupfCeTsw5ZShjtLhN34Evv16INPVf8M6lS6Z2WitVu4A7jg+aLHSNTObLe0SPGC5+1GMuQzZusMc+/pfkp+8tKXNOSt/j00j+UghXfbnA7C2VxKPvnJFC1slcq4MWX0QgKGrD3qFWJXIpGgTQ9EmhhI6IhWAaEHAkl9B9eFCqg8XYMnKZMO3h7BUu+Jgg+K0RHUOIqpzIFGdgwhvH+CV92uRc8frxWp1dTXp6elcf/31LW1Kkykrd3DpDXns2mfh/W9D6TuodWZlt1ZCvq4k7sXT45nTPwykcoJvzesFcw289nQ5/gFSpnw9kNjuIc1pZoOoKjKx+I1dHFyaR1K/MHSvXYc6KrClzTor1YcLKPhrOyUr9iM4nQT1T8bv6stRdWhzTqsLI3bsB2Bht1Reuepinv59HqN37GPEjn2tRqxKNSr8LhlNyY+zibmuPz5xwS1t0hlJbxvBE89fxoyb/scTz05ka4/Es+8k4jVInAKT5m3Ht9q1GjnsuFhddZC8CB0A1b4q5lzSDUHqHSt+EokEdVQg6qhAQgZ3AIYQ5RQw55ZRdbCA6oP5lKVnsn9xLg6bE7lKSkRHHTFdg4nuEkRMl2AxfKCV43VhAA8//DDjx48nPj6evLw8nnvuOXbs2MG+ffsIDQ1t0DH0ej0BAQFeFQaQV2BnzNW5FBTZee/7EFK7iD+c5iT2kVJC/zDWGhOAXRsjcES4SgdVVzl545ly5s8xMmaCDymPDEPt511lhQSnwLYZmaz8cC9ylYzo20YRPKSDV4eROO0OylYfIv/PrVTty0MZ5o9mQF98B/RCFtC0sIqeh7OILi3nz95dXfF2gsCEjTvIDQ5sNWIVQLDZKXjiVYIHJpN0z8iWNqcWEqfAO0/8wec3D+RQ+6iWNkekifgYLcy8/gsCqswIwKcSeEuARyRwlwASoNJPzRU/3o7Rp3U9p5xWO4b0Iqr25aLOPETujlKqisyAy/sa0y2Y2O4hxHYPJjBW69X3zQuFVhuzevXVV7Nq1SpKS0sJDQ1lwIABvPLKK7Rp06bBx/A2sZqeZWX05Fxsdnj/xxAS23qXALoQ8P3XRLsbSjh+a7IFSNi9LQrkrqWi3dssPHVfKRXlToY/0Z1Ol3hfzGfRYT0LXtpO7s4yul6egPLaS726ZqpNb6Lwnx0U/LUda2k1/l3iUA4YgqZrx9OSpUSgYtYCjKs30PO3u70nlEMQWDXmLWSCa3IHMOPSbnzgZYJapHGEF+l5/tW/6Lwv77TkxN0p0Tz35HiKwhoejuPNWIr06PfmULU3F+eBdIoO60EA31A1cT1CiOsRQvxFIQTF+4ritQVotWLVHXiTWN170MLoybn4aqW891MIkdFeH3lx3pLSPxdVjpPSyT4ce9O11OpwCHz3aRVfvFdJSpqSAS8PITDGu+KI7VYHa788yPp
|
||
|
"text/plain": [
|
||
|
"<Figure size 800x400 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(8, 4))\n",
|
||
|
"\n",
|
||
|
"plot_gaussian_mixture(gm, X)\n",
|
||
|
"plt.scatter(anomalies[:, 0], anomalies[:, 1], color='r', marker='*')\n",
|
||
|
"plt.ylim(top=5.1);"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"celltoolbar": "Slideshow",
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3 (ipykernel)",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.11.11"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 5
|
||
|
}
|