spce0038-machine-learning-w.../week1/slides/Lecture02_Pandas_Exercises_no_solutions.ipynb
2025-01-24 13:21:11 +00:00

1 line
3.1 KiB
Plaintext

{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercises for Lecture 2 (Data wrangling with Pandas)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import datetime\n", "now = datetime.datetime.now()\n", "print(\"Last executed: \" + now.strftime(\"%Y-%m-%d %H:%M:%S\"))"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import pandas as pd\n", "import numpy as np"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 1: Data selection\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["area = pd.Series({'California': 423967, 'Texas': 695662,\n", " 'New York': 141297, 'Florida': 170312,\n", " 'Illinois': 149995})\n", "pop = pd.Series({'California': 38332521, 'Texas': 26448193,\n", " 'New York': 19651127, 'Florida': 19552860,\n", " 'Illinois': 12882135})\n", "data = pd.DataFrame({'area':area, 'population':pop})\n", "data"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Create a `DataFrame` containing only those states that have an area greater than 150,000 and a population greater than 20 million."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercise 2: Operating on data in Pandas\n", "Consider the following two series."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["area = pd.Series({'Alaska': 1723337, 'Texas': 695662,\n", " 'California': 423967}, name='area')\n", "population = pd.Series({'California': 38332521, 'Texas': 26448193,\n", " 'New York': 19651127}, name='population') "]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "-"}}, "source": ["Compute the population density for each state (where possible)."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["## Exercise 3: Detecting null values"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Consider the following series."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["data = pd.Series([1, np.nan, 'hello', np.nan])\n", "data"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Compute a new Series of bools that specify whether each entry in the above Series is *not* NaN. Using this Series, construct a new series from the original data that does not contain the NaN entries."]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["### Exercise 4: Remove null values directly\n", "\n", "Remove null values from the previous data `Series` directly."]}], "metadata": {"celltoolbar": "Tags", "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5"}}, "nbformat": 4, "nbformat_minor": 4}