{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data reclassification\n", "\n", "Reclassifying data based on specific criteria is a common task when doing GIS analysis. The purpose of this lesson is to see how we can reclassify values based on some criteria. We could, for example, classify information based on travel times and housing prices using these criteria:\n", "\n", "```\n", "1. if travel time to my work is less than 30 minutes\n", "\n", " AND\n", "\n", " 2. the rent of the apartment is less than 1000 € per month\n", "\n", " ------------------------------------------------------\n", "\n", " IF TRUE: ==> I go to view it and try to rent the apartment\n", " IF NOT TRUE: ==> I continue looking for something else\n", "```\n", "\n", "In this tutorial, we will:\n", "\n", "1. Use classification schemes from the PySAL [mapclassify library](https://pysal.org/mapclassify/) to classify travel times into multiple classes.\n", "\n", "2. Create a custom classifier to classify travel times and distances in order to find out good locations to buy an apartment with these conditions:\n", " - good public transport accessibility to city center\n", " - bit further away from city center where the prices are presumably lower\n", "\n", "## Input data\n", "\n", "We will use [Travel Time Matrix data from Helsinki](https://blogs.helsinki.fi/accessibility/helsinki-region-travel-time-matrix/) that contains travel time and distance information for \n", "routes between all 250 m x 250 m grid cell centroids (n = 13231) in the Capital Region of Helsinki by walking, cycling, public transportation and car.\n", "\n", "In this tutorial, we will use the geojson file generated in the previous section:\n", "`\"data/TravelTimes_to_5975375_RailwayStation_Helsinki.geojson\"`\n", "\n", "Alternatively, you can re-download [L4 data](https://github.com/AutoGIS/data/raw/master/L4_data.zip) and use `\"data/Travel_times_to_5975375_RailwayStation.shp\"` as input file in here.\n", "\n", "\n", "\n", "## Common classifiers\n", "\n", "### Classification schemes for thematic maps\n", "\n", "\n", "[PySAL](https://pysal.org/) -module is an extensive Python library for spatial analysis. It also includes all of the most common data classifiers that are used commonly e.g. when visualizing data. Available map classifiers in [pysal's mapclassify -module](https://github.com/pysal/mapclassify):\n", "\n", " - Box_Plot\n", " - Equal_Interval\n", " - Fisher_Jenks\n", " - Fisher_Jenks_Sampled\n", " - HeadTail_Breaks\n", " - Jenks_Caspall\n", " - Jenks_Caspall_Forced\n", " - Jenks_Caspall_Sampled\n", " - Max_P_Classifier\n", " - Maximum_Breaks\n", " - Natural_Breaks\n", " - Quantiles\n", " - Percentiles\n", " - Std_Mean\n", " - User_Defined\n", "\n", "- First, we need to read our Travel Time data from Helsinki:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
car_m_dcar_m_tcar_r_dcar_r_tfrom_idpt_m_dpt_m_tpt_m_ttpt_r_dpt_r_tpt_r_ttto_idwalk_dwalk_tGML_IDNAMEFINNAMESWENATCODEgeometry
029476412948346587627429990769524984779959753752553236527517366HelsinkiHelsingfors091POLYGON ((402250.000 6685750.000, 402024.224 6...
129456412946246587627529866749524860759359753752540836327517366HelsinkiHelsingfors091POLYGON ((402367.890 6685750.000, 402250.000 6...
2367725036778565876278335411161374426513014659753753111044427517366HelsinkiHelsingfors091POLYGON ((403250.000 6685750.000, 403148.515 6...
3368984936904565876279337201191414444413215559753753128944727517366HelsinkiHelsingfors091POLYGON ((403456.484 6685750.000, 403250.000 6...
429411402941844587812829944759524938769959753752548636427517366HelsinkiHelsingfors091POLYGON ((402000.000 6685500.000, 401900.425 6...
\n", "
" ], "text/plain": [ " car_m_d car_m_t car_r_d car_r_t from_id pt_m_d pt_m_t pt_m_tt \\\n", "0 29476 41 29483 46 5876274 29990 76 95 \n", "1 29456 41 29462 46 5876275 29866 74 95 \n", "2 36772 50 36778 56 5876278 33541 116 137 \n", "3 36898 49 36904 56 5876279 33720 119 141 \n", "4 29411 40 29418 44 5878128 29944 75 95 \n", "\n", " pt_r_d pt_r_t pt_r_tt to_id walk_d walk_t GML_ID NAMEFIN \\\n", "0 24984 77 99 5975375 25532 365 27517366 Helsinki \n", "1 24860 75 93 5975375 25408 363 27517366 Helsinki \n", "2 44265 130 146 5975375 31110 444 27517366 Helsinki \n", "3 44444 132 155 5975375 31289 447 27517366 Helsinki \n", "4 24938 76 99 5975375 25486 364 27517366 Helsinki \n", "\n", " NAMESWE NATCODE geometry \n", "0 Helsingfors 091 POLYGON ((402250.000 6685750.000, 402024.224 6... \n", "1 Helsingfors 091 POLYGON ((402367.890 6685750.000, 402250.000 6... \n", "2 Helsingfors 091 POLYGON ((403250.000 6685750.000, 403148.515 6... \n", "3 Helsingfors 091 POLYGON ((403456.484 6685750.000, 403250.000 6... \n", "4 Helsingfors 091 POLYGON ((402000.000 6685500.000, 401900.425 6... " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import geopandas as gpd\n", "\n", "fp = \"data/TravelTimes_to_5975375_RailwayStation_Helsinki.geojson\"\n", "\n", "# Read the GeoJSON file similarly as Shapefile\n", "acc = gpd.read_file(fp)\n", "\n", "# Let's see what we have\n", "acc.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, there are plenty of different variables (see [from here the description](http://blogs.helsinki.fi/accessibility/helsinki-region-travel-time-matrix-2015) for all attributes) but what we are interested in are columns called `pt_r_tt` which is telling the time in minutes that it takes to reach city center from different parts of the city, and `walk_d` that tells the network distance by roads to reach city center from different parts of the city (almost equal to Euclidian distance).\n", "\n", "**The NoData values are presented with value -1**. \n", "\n", "- Thus we need to remove the No Data values first.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Include only data that is above or equal to 0\n", "acc = acc.loc[acc['pt_r_tt'] >=0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Let's plot the data and see how it looks like\n", "- `cmap` parameter defines the color map. Read more about [choosing colormaps in matplotlib](https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html)\n", "- `scheme` option scales the colors according to a classification scheme (requires `mapclassify` module to be installed):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "# Plot using 9 classes and classify the values using \"Natural Breaks\" classification\n", "acc.plot(column=\"pt_r_tt\", scheme=\"Natural_Breaks\", k=9, cmap=\"RdYlBu\", linewidth=0, legend=True)\n", "\n", "# Use tight layout\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see from this map, the travel times are lower in the south where the city center is located but there are some areas of \"good\" accessibility also in some other areas (where the color is red).\n", "\n", "- Let's also make a plot about walking distances:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot walking distance\n", "acc.plot(column=\"walk_d\", scheme=\"Natural_Breaks\", k=9, cmap=\"RdYlBu\", linewidth=0, legend=True)\n", "\n", "# Use tight layour\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, from here we can see that the walking distances (along road network) reminds more or less Euclidian distances. \n", "\n", "### Applying classifiers to data\n", "\n", "As mentioned, the `scheme` option defines the classification scheme using `pysal/mapclassify`. Let's have a closer look at how these classifiers work." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import mapclassify" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Natural Breaks" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NaturalBreaks \n", "\n", " Interval Count\n", "------------------------\n", "[ 0.00, 21.00] | 263\n", "( 21.00, 30.00] | 529\n", "( 30.00, 37.00] | 779\n", "( 37.00, 45.00] | 914\n", "( 45.00, 53.00] | 480\n", "( 53.00, 63.00] | 356\n", "( 63.00, 76.00] | 251\n", "( 76.00, 94.00] | 178\n", "( 94.00, 155.00] | 57" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mapclassify.NaturalBreaks(y=acc['pt_r_tt'], k=9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Quantiles (default is 5 classes):" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Quantiles \n", "\n", " Interval Count\n", "------------------------\n", "[ 0.00, 30.00] | 792\n", "( 30.00, 37.00] | 779\n", "( 37.00, 44.00] | 821\n", "( 44.00, 56.00] | 685\n", "( 56.00, 155.00] | 730" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mapclassify.Quantiles(y=acc['pt_r_tt'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- It's possible to extract the threshold values into an array:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 21., 30., 38., 45., 53., 63., 77., 95., 155.])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "classifier = mapclassify.NaturalBreaks(y=acc['pt_r_tt'], k=9)\n", "classifier.bins" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Let's apply one of the `Pysal` classifiers into our data and classify the travel times by public transport into 9 classes\n", "- The classifier needs to be initialized first with `make()` function that takes the number of desired classes as input parameter" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Create a Natural Breaks classifier\n", "classifier = mapclassify.NaturalBreaks.make(k=9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Now we can apply that classifier into our data by using `apply` -function" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pt_r_tt
08
17
28
38
48
\n", "
" ], "text/plain": [ " pt_r_tt\n", "0 8\n", "1 7\n", "2 8\n", "3 8\n", "4 8" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Classify the data\n", "classifications = acc[['pt_r_tt']].apply(classifier)\n", "\n", "# Let's see what we have\n", "classifications.head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(classifications)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, so now we have a DataFrame where our input column was classified into 9 different classes (numbers 1-9) based on [Natural Breaks classification](http://wiki-1-1930356585.us-east-1.elb.amazonaws.com/wiki/index.php/Jenks_Natural_Breaks_Classification).\n", "\n", "- We can also add the classification values directly into a new column in our dataframe:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pt_r_ttnb_pt_r_tt
0997
1937
21468
31558
4997
\n", "
" ], "text/plain": [ " pt_r_tt nb_pt_r_tt\n", "0 99 7\n", "1 93 7\n", "2 146 8\n", "3 155 8\n", "4 99 7" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rename the column so that we know that it was classified with natural breaks\n", "acc['nb_pt_r_tt'] = acc[['pt_r_tt']].apply(classifier)\n", "\n", "# Check the original values and classification\n", "acc[['pt_r_tt', 'nb_pt_r_tt']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great, now we have those values in our accessibility GeoDataFrame. Let's visualize the results and see how they look." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot\n", "acc.plot(column=\"nb_pt_r_tt\", linewidth=0, legend=True)\n", "\n", "# Use tight layout\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here we go, now we have a map where we have used one of the common classifiers to classify our data into 9 classes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting a histogram\n", "\n", "A histogram is a graphic representation of the distribution of the data. When classifying the data, it's always good to consider how the data is distributed, and how the classification shceme divides values into different ranges. \n", "\n", "- plot the histogram using [pandas.DataFrame.plot.hist](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.hist.html)\n", "- Number of histogram bins (groups of data) can be controlled using the parameter `bins`:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD4CAYAAAAZ1BptAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWsUlEQVR4nO3dfZBdd33f8fcH2fiBh7Fdy46QRGQYmVTOBJksKi1JB2wSOza18HRIxQRGaZ2IpKYDTdogQSZAZzQjUsAkk0IisBvx6AgwWDXQIjsYhhmwkI1tLNuqRSXstRRrIaU2NCMj8+0f92y52PfsXj2cvVfe92tm557zu+fsfnbl3Y/Pc6oKSZIGecaoA0iSxpclIUlqZUlIklpZEpKkVpaEJKnVSaMOcCzOPvvsWrZs2ahjSNIJ5fbbb/9eVS0cZtkTuiSWLVvGzp07Rx1Dkk4oSb477LLubpIktbIkJEmtLAlJUitLQpLUypKQJLWyJCRJrSwJSVIrS0KS1MqSkCS1OqGvuNaRWbb+863v7dt0+RwmkXSi6HxLIsmCJN9KclMzf1aS7UkeaF7P7Ft2Q5I9SXYnuaTrbJKkmc3F7qY3A/f1za8Hbqmq5cAtzTxJVgBrgAuAS4EPJFkwB/kkSS06LYkkS4DLgQ/3Da8GtjTTW4DX9I1fX1WHqmovsAdY1WU+SdLMut6SeD/wR8BP+sbOraoDAM3rOc34YuChvuUmm7GfkWRdkp1Jdk5NTXUSWpLU01lJJHk1cLCqbh92lQFj9ZSBqs1VNVFVEwsXDnU7dEnSUery7KaXA1ckuQw4FXhuko8BjyRZVFUHkiwCDjbLTwJL+9ZfAuzvMJ8kaRadbUlU1YaqWlJVy+gdkP7bqno9sA1Y2yy2Frixmd4GrElySpLzgOXAjq7ySZJmN4rrJDYBW5NcBTwIvBagqnYl2QrcCxwGrq6qJ0aQT5LUmJOSqKpbgVub6e8DF7cstxHYOBeZJEmz87YckqRWloQkqZUlIUlqZUlIklpZEpKkVpaEJKmVJSFJamVJSJJaWRKSpFY+vlRA+6NNfaypNL+5JSFJamVJSJJaWRKSpFaWhCSplSUhSWrl2U2akWc9SfNbZ1sSSU5NsiPJXUl2JXlXM/7OJA8nubP5uKxvnQ1J9iTZneSSrrJJkobT5ZbEIeCiqvphkpOBryX5YvPeNVX1nv6Fk6yg9yzsC4DnATcnOd9HmErS6HS2JVE9P2xmT24+aoZVVgPXV9WhqtoL7AFWdZVPkjS7Tg9cJ1mQ5E7gILC9qm5r3npTkruTXJfkzGZsMfBQ3+qTzZgkaUQ6LYmqeqKqVgJLgFVJfhH4IPBCYCVwAHhvs3gGfYonDyRZl2Rnkp1TU1Od5JYk9czJ2U1V9YMktwKX9h+LSPIh4KZmdhJY2rfaEmD/gM+1GdgMMDExMdPuK3XIs56k+aHLs5sWJjmjmT4NeBVwf5JFfYtdCdzTTG8D1iQ5Jcl5wHJgR1f5JEmz63JLYhGwJckCemW0tapuSvLRJCvp7UraB7wRoKp2JdkK3AscBq72zCZJGq3OSqKq7gYuHDD+hhnW2Qhs7CqTJOnIeFsOSVIrS0KS1MqSkCS1siQkSa0sCUlSK0tCktTKkpAktbIkJEmtLAlJUitLQpLUypKQJLWyJCRJrSwJSVIrS0KS1MqSkCS1siQkSa0sCUlSqy6fcX1qkh1J7kqyK8m7mvGzkmxP8kDzembfOhuS7EmyO8klXWWTJA2nyy2JQ8BFVfViYCVwaZKXAeuBW6pqOXBLM0+SFcAa4ALgUuADzfOxJUkj0llJVM8Pm9mTm48CVgNbmvEtwGua6dXA9VV1qKr2AnuAVV3lkyTNrtNjEkkWJLkTOAhsr6rbgHOr6gBA83pOs/hi4KG+1SebsSd/znVJdibZOTU11WV8SZr3Oi2JqnqiqlYCS4BVSX5xhsUz6FMM+Jybq2qiqiYWLlx4nJJKkgaZk7ObquoHwK30jjU8kmQRQPN6sFlsEljat9oSYP9c5JMkDdbl2U0Lk5zRTJ8GvAq4H9gGrG0WWwvc2ExvA9YkOSXJecByYEdX+SRJszupw8+9CNjSnKH0DGBrVd2U5OvA1iRXAQ8CrwWoql1JtgL3AoeBq6vqiQ7zSZJm0VlJVNXdwIUDxr8PXNyyzkZgY1eZJElHxiuuJUmtutzdpHlo2frPDxzft+nyOU4i6XhwS0KS1MqSkCS1siQkSa0sCUlSK0tCktTKkpAktbIkJEmtLAlJUitLQpLUypKQJLWyJCRJrSwJSVKroUpilseOSpKepobdkvjLJDuS/Nvpp81Jkp7+hiqJqvoV4LfoPYN6Z5JPJPm1mdZJsjTJl5Pcl2RXkjc34+9M8nCSO5uPy/rW2ZBkT5LdSS45hu9LknQcDP08iap6IMkfAzuBPwcuTBLgbVV1w4BVDgN/WFV3JHkOcHuS7c1711TVe/oXTrICWANcADwPuDnJ+T7CVJJGZ9hjEr+U5BrgPuAi4F9U1T9upq8ZtE5VHaiqO5rpx5p1F8/wZVYD11fVoaraC+wBVg39nUiSjrthj0n8BXAH8OKqurrvj/9+4I9nWznJMnrPu76tGXpTkruTXJfkzGZsMfBQ32qTDCiVJOuS7Eyyc2pqasj4kqSjMWxJXAZ8oqr+ASDJM5KcDlBVH51pxSTPBj4DvKWqHgU+CLwQWAkcAN47veiA1espA1Wbq2qiqiYWLlw4ZHxJ0tEYtiRuBk7rmz+9GZtRkpPpFcTHp49bVNUjVfVEVf0E+BA/3aU0Se/A+LQlwP4h80mSOjBsSZxaVT+cnmmmT59pheag9rXAfVX1vr7xRX2LXQnc00xvA9YkOSXJecByYMeQ+SRJHRj27KYfJXnJ9LGIJL8M/MMs67wceAPw7SR3NmNvA16XZCW9XUn7gDcCVNWuJFuBe+mdGXW1ZzZJ0mgNWxJvAT6VZHr3zyLgX820QlV9jcHHGb4wwzobgY1DZpIkdWyokqiqbyb5BeBF9P7w319VP+40mSRp5Ia+mA54KbCsWefCJFTVRzpJJUkaC0OVRJKP0jtt9U5g+jhBAZbEGFq2/vOjjvAUbZn2bbp8jpNIOhLDbklMACuq6inXLUiSnr6GPQX2HuDnugwiSRo/w25JnA3cm2QHcGh6sKqu6CSVJGksDFsS7+wyhCRpPA17CuxXkvw8sLyqbm7u27Sg22iSpFEb9lbhvwt8GvirZmgx8LmOMkmSxsSwB66vpnebjUeh9wAi4JyuQkmSxsOwJXGoqh6fnklyEgNu4y1JenoZtiS+kuRtwGnNs60/Bfy37mJJksbBsCWxHpgCvk3vrq1fYIgn0kmSTmzDnt00/YCgD3UbR5I0Toa9d9NeBj9K9AXHPZEkaWwcyb2bpp0KvBY46/jHkSSNk6GOSVTV9/s+Hq6q9wMXzbROkqVJvpzkviS7kry5GT8ryfYkDzSvZ/atsyHJniS7k1xyLN+YJOnYDbu76SV9s8+gt2XxnFlWOwz8YVXdkeQ5wO1JtgO/DdxSVZuSrKd3UPytSVYAa4ALgOcBNyc530eYStLoDLu76b1904fpPZv6N2daoaoOAAea6ceS3EfvSu3VwCuaxbYAtwJvbcavr6pDwN4ke4BVwNeHzChJOs6GPbvplcfyRZIsAy4EbgPObQqEqjqQZPrK7cXAN/pWm2zGJEkjMuzupj+Y6f2qet8M6z4b+Azwlqp6NEnrooM+9YDPtw5YB/D85z9/pliSpGM07MV0E8Dv0/s/+8XA7wEr6B2XaD02keRkegXx8aq6oRl+JMmi5v1FwMFmfBJY2rf6EmD/kz9nVW2uqomqmli4cOGQ8SVJR+NIHjr0kqp6DCDJO4FPVdXvtK2Q3ibDtcB9T9rS2AasBTY1rzf2jX8iyfvoHbheDuwY/luRJB1vw5bE84HH++YfB5bNss7LgTcA305yZzP2NnrlsDXJVcCD9K65oKp2JdkK3Evv4PjVntkkSaM1bEl8FNiR5LP0jhNcCXxkphWq6msMPs4AcHHLOhuBjUNmkiR1bNizmzYm+SLwq83Qv66qb3UXS5I0DoY9cA1wOvBoVf0ZMJnkvI4ySZLGxLCPL30HvQveNjRDJwMf6yqUJGk8DLslcSVwBfAjgKraz+y35ZAkneCGLYnHq6poLm5L8qzuIkmSxsWwJbE1yV8BZyT5XeBmfACRJD3tzXp2U3NR3N8AvwA8CrwI+JOq2t5xNknSiM1aElVVST5XVb8MWAySNI8Mu7vpG0le2mkSSdLYGfaK61cCv5dkH70znEJvI+OXugomSRq9GUsiyfOr6kHgN+Yoj47AsvWfH3UESU9zs21JfI7e3V+/m+QzVfUv5yCTJGlMzHZMov8GfS/oMogkafzMVhLVMi1Jmgdm29304iSP0tuiOK2Zhp8euH5up+kkSSM1Y0lU1YK5CiJJGj9HcqtwSdI801lJJLkuycEk9/SNvTPJw0nubD4u63tvQ5I9SXYnuaSrXJKk4XW5JfHXwKUDxq+pqpXNxxcAkqwA1gAXNOt8IIm7uiRpxDoriar6KvD3Qy6+Gri+qg5V1V5gD7Cqq2ySpOGM4pjEm5Lc3eyOOrMZWww81LfMZDP2FEnWJdmZZOfU1FTXWSVpXpvrkvgg8EJgJXAAeG8zngHLDrwuo6o2V9VEVU0sXLiwk5CSpJ45LYmqeqSqnqiqn9B7aNH0LqVJYGnfokuA/XOZTZL0VHNaEkkW9c1eCUyf+bQNWJPklCTnAcuBHXOZTZL0VMPeKvyIJfkk8Arg7CSTwDuAVyRZSW9X0j7gjQBVtSvJVuBe4DBwdVU90VU2SdJwOiuJqnrdgOFrZ1h+I7Cxqzyan9pup75v0+VznEQ6MXnFtSSplSUhSWplSUiSWlkSkqRWloQkqVVnZzdJw/DsI2m8WRIngLY/pJLUNXc3SZJauSWhE4q7p6S55ZaEJKmVJSFJamVJSJJaWRKSpFaWhCSplSUhSWplSUiSWnVWEkmuS3IwyT19Y2cl2Z7kgeb1zL73NiTZk2R3kku6yiVJGl6XF9P9NfAXwEf6xtYDt1TVpiTrm/m3JlkBrAEuAJ4H3JzkfB9hOn95KxJpPHT5+NKvJln2pOHV9J57DbAFuBV4azN+fVUdAvYm2QOsAr7eVb5x5B9GSeNmro9JnFtVBwCa13Oa8cXAQ33LTTZjT5FkXZKdSXZOTU11GlaS5rtxuXdTBozVoAWrajOwGWBiYmLgMpp/3AqTujHXWxKPJFkE0LwebMYngaV9yy0B9s9xNknSk8x1SWwD1jbTa4Eb+8bXJDklyXnAcmDHHGeTJD1JZ7ubknyS3kHqs5NMAu8ANgFbk1wFPAi8FqCqdiXZCtwLHAau9swmSRq9Ls9uel3LWxe3LL8R2NhVHknSkfOKa0lSK0tCktTKkpAktbIkJEmtLAlJUitLQpLUypKQJLWyJCRJrSwJSVIrS0KS1GpcbhUuzam2W4vv23T5HCeRxpslMQI++0DSicLdTZKkVpaEJKmVJSFJamVJSJJajeTAdZJ9wGPAE8DhqppIchbwN8AyYB/wm1X1v0eRT5LUM8otiVdW1cqqmmjm1wO3VNVy4JZmXpI0QuO0u2k1sKWZ3gK8ZnRRJEkwupIo4EtJbk+yrhk7t6oOADSv5wxaMcm6JDuT7JyampqjuJI0P43qYrqXV9X+JOcA25PcP+yKVbUZ2AwwMTFRXQWUJI2oJKpqf/N6MMlngVXAI0kWVdWBJIuAg6PIpvltpqvhvWWH5qM5L4kkzwKeUVWPNdO/DvwnYBuwFtjUvN4419mON2+/8fTi/Z40H41iS+Jc4LNJpr/+J6rqvyf5JrA1yVXAg8BrR5BNktRnzkuiqv4X8OIB498HLp7rPJKkdt4FVppj7rbSiWScrpOQJI0ZS0KS1MrdTceBZzHpeHA3lMaRWxKSpFZuSUgdcQtTTweWhHSMui4Dd0NplCwJ6QR1pOVkqehoWBLSPOEWiY6GB64lSa0sCUlSK0tCktTKkpAktbIkJEmtPLvpCHhxlKT5xi0JSVIrS0KS1GrsdjcluRT4M2AB8OGq2jTXGdytpPnkeF5k5wV7Tz9jVRJJFgD/Bfg1YBL4ZpJtVXXvaJNJ88/x/J8ly+PENVYlAawC9jTPwSbJ9cBqoJOScItBGq35WB5H+j2P+mc0biWxGHiob34S+Cf9CyRZB6xrZn+YZPcxfL2zge8dw/pdMtvRGedsMN75xiZb3v2UobHJNsBxyTbgez5eyw/K9/PDrjxuJZEBY/UzM1Wbgc3H5YslO6tq4nh8ruPNbEdnnLPBeOcz29EZ52xw7PnG7eymSWBp3/wSYP+IskjSvDduJfFNYHmS85I8E1gDbBtxJkmat8Zqd1NVHU7yJuB/0DsF9rqq2tXhlzwuu606YrajM87ZYLzzme3ojHM2OMZ8qarZl5IkzUvjtrtJkjRGLAlJUqt5WRJJLk2yO8meJOtHnGVpki8nuS/JriRvbsbPSrI9yQPN65kjzLggybeS3DSG2c5I8ukk9zc/w386LvmS/Pvm3/SeJJ9McuqosiW5LsnBJPf0jbVmSbKh+f3YneSSEeX7z82/691JPpvkjFHkG5St773/kKSSnD1O2ZL8u+br70ryp8eUrarm1Qe9A+LfAV4APBO4C1gxwjyLgJc0088B/iewAvhTYH0zvh549wgz/gHwCeCmZn6csm0BfqeZfiZwxjjko3dh6F7gtGZ+K/Dbo8oG/HPgJcA9fWMDszT//d0FnAKc1/y+LBhBvl8HTmqm3z2qfIOyNeNL6Z1k813g7HHJBrwSuBk4pZk/51iyzcctif9/64+qehyYvvXHSFTVgaq6o5l+DLiP3h+Y1fT+ANK8vmYU+ZIsAS4HPtw3PC7Znkvvl+RagKp6vKp+MC756J09eFqSk4DT6V3zM5JsVfVV4O+fNNyWZTVwfVUdqqq9wB56vzdzmq+qvlRVh5vZb9C7bmrO87X87ACuAf6In73gdxyy/T6wqaoONcscPJZs87EkBt36Y/GIsvyMJMuAC4HbgHOr6gD0igQ4Z0Sx3k/vF+EnfWPjku0FwBTwX5vdYR9O8qxxyFdVDwPvAR4EDgD/p6q+NA7Z+rRlGcffkX8DfLGZHnm+JFcAD1fVXU96a+TZgPOBX01yW5KvJHnpsWSbjyUx660/RiHJs4HPAG+pqkdHnQcgyauBg1V1+6iztDiJ3qb2B6vqQuBH9HabjFyzf381vc365wHPSvL60aYa2lj9jiR5O3AY+Pj00IDF5ixfktOBtwN/MujtAWNz/bM7CTgTeBnwH4GtScJRZpuPJTF2t/5IcjK9gvh4Vd3QDD+SZFHz/iLgYNv6HXo5cEWSffR2y12U5GNjkg16/5aTVXVbM/9peqUxDvleBeytqqmq+jFwA/DPxiTbtLYsY/M7kmQt8Grgt6rZsc7o872QXvnf1fxuLAHuSPJzY5CNJsMN1bOD3l6As48223wsibG69UfT8NcC91XV+/re2gasbabXAjfOdbaq2lBVS6pqGb2f099W1evHIVuT7++Ah5K8qBm6mN5t5cch34PAy5Kc3vwbX0zveNM4ZJvWlmUbsCbJKUnOA5YDO+Y6XHoPIHsrcEVV/d++t0aar6q+XVXnVNWy5ndjkt7JJ3836myNzwEXASQ5n94JHd876mxdHXUf5w/gMnpnEX0HePuIs/wKvU2+u4E7m4/LgH8E3AI80LyeNeKcr+CnZzeNTTZgJbCz+fl9jt5m9ljkA94F3A/cA3yU3lklI8kGfJLesZEf0/ujdtVMWejtTvkOsBv4jRHl20NvH/r078VfjiLfoGxPen8fzdlN45CNXil8rPnv7g7gomPJ5m05JEmt5uPuJknSkCwJSVIrS0KS1MqSkCS1siQkSa0sCUlSK0tCktTq/wEGE0XcX0iFAAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Histogram for public transport rush hour travel time\n", "acc['pt_r_tt'].plot.hist(bins=50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also add threshold values on thop of the histogram as vertical lines.\n", "\n", "- Natural Breaks:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Define classifier\n", "classifier = mapclassify.NaturalBreaks(y=acc['pt_r_tt'], k=9)\n", "\n", "# Plot histogram for public transport rush hour travel time\n", "acc['pt_r_tt'].plot.hist(bins=50)\n", "\n", "# Add vertical lines for class breaks\n", "for value in classifier.bins:\n", " plt.axvline(value, color='k', linestyle='dashed', linewidth=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Quantiles:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Define classifier\n", "classifier = mapclassify.Quantiles(y=acc['pt_r_tt'])\n", "\n", "# Plot histogram for public transport rush hour travel time\n", "acc['pt_r_tt'].plot.hist(bins=50)\n", "\n", "for value in classifier.bins:\n", " plt.axvline(value, color='k', linestyle='dashed', linewidth=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "**Check your understanding**\n", "\n", "Select another column from the data (for example, travel times by car: `car_r_t`). Do the following visualizations using one of the classification schemes available from [pysal/mapclassify](https://github.com/pysal/mapclassify):\n", " \n", "- histogram with vertical lines showing the classification bins\n", "- thematic map using the classification scheme\n", "\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a custom classifier\n", "\n", "**Multicriteria data classification**\n", "\n", "Let's create a function where we classify the geometries into two classes based on a given `threshold` -parameter. If the area of a polygon is lower than the threshold value (average size of the lake), the output column will get a value 0, if it is larger, it will get a value 1. This kind of classification is often called a [binary classification](https://en.wikipedia.org/wiki/Binary_classification).\n", "\n", "First we need to create a function for our classification task. This function takes a single row of the GeoDataFrame as input, plus few other parameters that we can use.\n", "\n", "It also possible to do classifiers with multiple criteria easily in Pandas/Geopandas by extending the example that we started earlier. Now we will modify our binaryClassifier function a bit so that it classifies the data based on two columns.\n", "\n", "- Let's call it `custom_classifier` that does the binary classification based on two treshold values:\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def custom_classifier(row, src_col1, src_col2, threshold1, threshold2, output_col):\n", " \"\"\"Custom classirifer that can be applied on each row of a pandas dataframe (axis=1).\n", " \n", " This function classifies data based on values in two source columns and stores the output value in the output column.\n", " Output values is 1 if the value in src_col1 is LOWER than the threshold1 value AND the value in src_col2 is HIGHER than the threshold2 value. \n", " In all other cases, output value is 0.\n", " \n", " Args:\n", " row: one row of data\n", " src_col1: source column name associated with threshold1\n", " src_col2: source column name associated with threshold2\n", " threshold1: upper threshold value for src_col1\n", " threshold2: lower threshold value for src_col2\n", " output_col: output column name\n", "\n", " Returns:\n", " updated row of data.\n", " \"\"\"\n", "\n", " # If condition is true, assign 1 into output column\n", " if row[src_col1] < threshold1 and row[src_col2] > threshold2:\n", " row[output_col] = 1\n", " \n", " # Else, assign 1 into output column\n", " else:\n", " row[output_col] = 0\n", "\n", " # Return the updated row\n", " return row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have defined the function, and we can start using it.\n", "\n", "- Let's do our classification based on two criteria and find out grid cells where the **travel time is lower or equal to 20 minutes** but they are further away **than 4 km (4000 meters) from city center**.\n", "\n", "- Let's create an empty column for our classification results called `\"suitable_area\"`.\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
car_m_dcar_m_tcar_r_dcar_r_tfrom_idpt_m_dpt_m_tpt_m_ttpt_r_dpt_r_t...to_idwalk_dwalk_tGML_IDNAMEFINNAMESWENATCODEgeometrynb_pt_r_ttsuitable_area
02947641294834658762742999076952498477...59753752553236527517366HelsinkiHelsingfors091POLYGON ((402250.000 6685750.000, 402024.224 6...70
12945641294624658762752986674952486075...59753752540836327517366HelsinkiHelsingfors091POLYGON ((402367.890 6685750.000, 402250.000 6...70
23677250367785658762783354111613744265130...59753753111044427517366HelsinkiHelsingfors091POLYGON ((403250.000 6685750.000, 403148.515 6...80
33689849369045658762793372011914144444132...59753753128944727517366HelsinkiHelsingfors091POLYGON ((403456.484 6685750.000, 403250.000 6...80
42941140294184458781282994475952493876...59753752548636427517366HelsinkiHelsingfors091POLYGON ((402000.000 6685500.000, 401900.425 6...70
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " car_m_d car_m_t car_r_d car_r_t from_id pt_m_d pt_m_t pt_m_tt \\\n", "0 29476 41 29483 46 5876274 29990 76 95 \n", "1 29456 41 29462 46 5876275 29866 74 95 \n", "2 36772 50 36778 56 5876278 33541 116 137 \n", "3 36898 49 36904 56 5876279 33720 119 141 \n", "4 29411 40 29418 44 5878128 29944 75 95 \n", "\n", " pt_r_d pt_r_t ... to_id walk_d walk_t GML_ID NAMEFIN \\\n", "0 24984 77 ... 5975375 25532 365 27517366 Helsinki \n", "1 24860 75 ... 5975375 25408 363 27517366 Helsinki \n", "2 44265 130 ... 5975375 31110 444 27517366 Helsinki \n", "3 44444 132 ... 5975375 31289 447 27517366 Helsinki \n", "4 24938 76 ... 5975375 25486 364 27517366 Helsinki \n", "\n", " NAMESWE NATCODE geometry \\\n", "0 Helsingfors 091 POLYGON ((402250.000 6685750.000, 402024.224 6... \n", "1 Helsingfors 091 POLYGON ((402367.890 6685750.000, 402250.000 6... \n", "2 Helsingfors 091 POLYGON ((403250.000 6685750.000, 403148.515 6... \n", "3 Helsingfors 091 POLYGON ((403456.484 6685750.000, 403250.000 6... \n", "4 Helsingfors 091 POLYGON ((402000.000 6685500.000, 401900.425 6... \n", "\n", " nb_pt_r_tt suitable_area \n", "0 7 0 \n", "1 7 0 \n", "2 8 0 \n", "3 8 0 \n", "4 7 0 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create column for the classification results\n", "acc[\"suitable_area\"] = None\n", "\n", "# Use the function\n", "acc = acc.apply(custom_classifier, \n", " src_col1='pt_r_tt', \n", " src_col2='walk_d', \n", " threshold1=20, \n", " threshold2=4000, \n", " output_col=\"suitable_area\", \n", " axis=1)\n", "\n", "# See the first rows\n", "acc.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okey we have new values in `suitable_area` -column.\n", "\n", "- How many Polygons are suitable for us? Let's find out by using a Pandas function called `value_counts()` that return the count of different values in our column.\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 3798\n", "1 9\n", "Name: suitable_area, dtype: int64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get value counts\n", "acc['suitable_area'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, so there seems to be nine suitable locations for us where we can try to find an appartment to buy.\n", "\n", "- Let's see where they are located:\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot\n", "acc.plot(column=\"suitable_area\", linewidth=0)\n", "\n", "# Use tight layour\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A-haa, okay so we can see that suitable places for us with our criteria seem to be located in the\n", "eastern part from the city center. Actually, those locations are along the metro line which makes them good locations in terms of travel time to city center since metro is really fast travel mode.\n", "\n", "**Other examples**\n", "\n", "Older course materials contain an example of applying a [custom binary classifier on the Corine land cover data](https://automating-gis-processes.github.io/2017/lessons/L4/reclassify.html#classifying-data>)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }