{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Nearest Neighbour Analysis\n", "\n", "\n", "One commonly used GIS task is to be able to find the nearest neighbour for an object or a set of objects. For instance, you might have a single Point object\n", "representing your home location, and then another set of locations representing e.g. public transport stops. Then, quite typical question is *\"which of the stops is closest one to my home?\"*\n", "This is a typical nearest neighbour analysis, where the aim is to find the closest geometry to another geometry.\n", "\n", "In Python this kind of analysis can be done with shapely function called ``nearest_points()`` that [returns a tuple of the nearest points in the input geometries](https://shapely.readthedocs.io/en/latest/manual.html#shapely.ops.nearest_points).\n", "\n", "## Nearest point using Shapely\n", "\n", "\n", "Let's start by testing how we can find the nearest Point using the ``nearest_points()`` function of Shapely.\n", "\n", "- Let's create an origin Point and a few destination Points and find out the closest destination:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from shapely.geometry import Point, MultiPoint\n", "from shapely.ops import nearest_points\n", "\n", "# Origin point\n", "orig = Point(1, 1.67)\n", "\n", "# Destination points\n", "dest1 = Point(0, 1.45)\n", "dest2 =Point(2, 2)\n", "dest3 = Point(0, 2.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To be able to find out the closest destination point from the origin, we need to create a MultiPoint object from the destination points." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MULTIPOINT (0 1.45, 2 2, 0 2.5)\n" ] } ], "source": [ "destinations = MultiPoint([dest1, dest2, dest3])\n", "print(destinations)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "destinations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okey, now we can see that all the destination points are represented as a single MultiPoint object.\n", "\n", "- Now we can find out the nearest destination point by using ``nearest_points()`` function:\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "nearest_geoms = nearest_points(orig, destinations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- We can check the data type of this object and confirm that the ``nearest_points()`` function returns a tuple of nearest points:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tuple" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(nearest_geoms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " - let's check the contents of this tuple:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(, )\n" ] } ], "source": [ "print(nearest_geoms)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "POINT (1 1.67)\n" ] } ], "source": [ "print(nearest_geoms[0])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "POINT (0 1.45)\n" ] } ], "source": [ "print(nearest_geoms[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the tuple, the first item (at index 0) is the geometry of our origin point and the second item (at index 1) is the actual nearest geometry from the destination points. Hence, the closest destination point seems to be the one located at coordinates (0, 1.45).\n", "\n", "This is the basic logic how we can find the nearest point from a set of points." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nearest points using Geopandas\n", "\n", "Let's then see how it is possible to find nearest points from a set of origin points to a set of destination points using GeoDataFrames. Here, we will use the ``PKS_suuralueet.kml`` district data, and the ``addresses.shp`` address points from previous sections. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Our goal in this tutorial is to find out the closest address to the centroid of each district.**\n", "\n", "- Let's first read in the data and check their structure:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Import geopandas\n", "import geopandas as gpd" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Define filepaths\n", "fp1 = \"data/PKS_suuralue.kml\"\n", "fp2 = \"data/addresses.shp\"" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Enable KML driver\n", "gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Read in data with geopandas\n", "df1 = gpd.read_file(fp1, driver='KML')\n", "df2 = gpd.read_file(fp2)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDescriptiongeometry
0Suur-EspoonlahtiPOLYGON Z ((24.775059677807 60.1090604462157 0...
1Suur-KauklahtiPOLYGON Z ((24.6157775254076 60.1725681273527 ...
2Vanha-EspooPOLYGON Z ((24.6757633262026 60.2120070032819 ...
3Pohjois-EspooPOLYGON Z ((24.767921197401 60.2691954732391 0...
4Suur-MatinkyläPOLYGON Z ((24.7536131356802 60.1663051341717 ...
\n", "
" ], "text/plain": [ " Name Description \\\n", "0 Suur-Espoonlahti \n", "1 Suur-Kauklahti \n", "2 Vanha-Espoo \n", "3 Pohjois-Espoo \n", "4 Suur-Matinkylä \n", "\n", " geometry \n", "0 POLYGON Z ((24.775059677807 60.1090604462157 0... \n", "1 POLYGON Z ((24.6157775254076 60.1725681273527 ... \n", "2 POLYGON Z ((24.6757633262026 60.2120070032819 ... \n", "3 POLYGON Z ((24.767921197401 60.2691954732391 0... \n", "4 POLYGON Z ((24.7536131356802 60.1663051341717 ... " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# District polygons:\n", "df1.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
addressidaddrgeometry
0Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns...1000Itämerenkatu 14, 00101 Helsinki, FinlandPOINT (24.9155624 60.1632015)
1Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp...1001Kampinkuja 1, 00100 Helsinki, FinlandPOINT (24.9316914 60.1690222)
2Bangkok9, 8, Kaivokatu, Keskusta, Kluuvi, Etel...1002Kaivokatu 8, 00101 Helsinki, FinlandPOINT (24.9416849 60.1699637)
3Hermannin rantatie, Kyläsaari, Hermanni, Helsi...1003Hermannin rantatie 1, 00580 Helsinki, FinlandPOINT (24.9719335 60.1969965)
4Hesburger, 9, Tyynenmerenkatu, Jätkäsaari, Län...1005Tyynenmerenkatu 9, 00220 Helsinki, FinlandPOINT (24.9216003 60.1566475)
\n", "
" ], "text/plain": [ " address id \\\n", "0 Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns... 1000 \n", "1 Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp... 1001 \n", "2 Bangkok9, 8, Kaivokatu, Keskusta, Kluuvi, Etel... 1002 \n", "3 Hermannin rantatie, Kyläsaari, Hermanni, Helsi... 1003 \n", "4 Hesburger, 9, Tyynenmerenkatu, Jätkäsaari, Län... 1005 \n", "\n", " addr \\\n", "0 Itämerenkatu 14, 00101 Helsinki, Finland \n", "1 Kampinkuja 1, 00100 Helsinki, Finland \n", "2 Kaivokatu 8, 00101 Helsinki, Finland \n", "3 Hermannin rantatie 1, 00580 Helsinki, Finland \n", "4 Tyynenmerenkatu 9, 00220 Helsinki, Finland \n", "\n", " geometry \n", "0 POINT (24.9155624 60.1632015) \n", "1 POINT (24.9316914 60.1690222) \n", "2 POINT (24.9416849 60.1699637) \n", "3 POINT (24.9719335 60.1969965) \n", "4 POINT (24.9216003 60.1566475) " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Address points:\n", "df2.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Furthermore, let's calculate the centroids for each district area:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDescriptiongeometrycentroid
0Suur-EspoonlahtiPOLYGON Z ((24.775059677807 60.1090604462157 0...POINT (24.76754037242762 60.0440879200116)
1Suur-KauklahtiPOLYGON Z ((24.6157775254076 60.1725681273527 ...POINT (24.57415010885406 60.19764302289445)
2Vanha-EspooPOLYGON Z ((24.6757633262026 60.2120070032819 ...POINT (24.60400724339237 60.25253297356344)
3Pohjois-EspooPOLYGON Z ((24.767921197401 60.2691954732391 0...POINT (24.68682879841453 60.30649462398335)
4Suur-MatinkyläPOLYGON Z ((24.7536131356802 60.1663051341717 ...POINT (24.76063843560942 60.15018263640097)
\n", "
" ], "text/plain": [ " Name Description \\\n", "0 Suur-Espoonlahti \n", "1 Suur-Kauklahti \n", "2 Vanha-Espoo \n", "3 Pohjois-Espoo \n", "4 Suur-Matinkylä \n", "\n", " geometry \\\n", "0 POLYGON Z ((24.775059677807 60.1090604462157 0... \n", "1 POLYGON Z ((24.6157775254076 60.1725681273527 ... \n", "2 POLYGON Z ((24.6757633262026 60.2120070032819 ... \n", "3 POLYGON Z ((24.767921197401 60.2691954732391 0... \n", "4 POLYGON Z ((24.7536131356802 60.1663051341717 ... \n", "\n", " centroid \n", "0 POINT (24.76754037242762 60.0440879200116) \n", "1 POINT (24.57415010885406 60.19764302289445) \n", "2 POINT (24.60400724339237 60.25253297356344) \n", "3 POINT (24.68682879841453 60.30649462398335) \n", "4 POINT (24.76063843560942 60.15018263640097) " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1['centroid'] = df1.centroid\n", "df1.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SO, for each row of data in the disctricts -table, we want to figure out the nearest address point and fetch some attributes related to that point. In other words, we want to apply the Shapely `nearest_points`function so that we compare each polygon centroid to all address points, and based on this information access correct attribute information from the address table. \n", "\n", "For doing this, we can create a function that we will apply on the polygon GeoDataFrame:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def get_nearest_values(row, other_gdf, point_column='geometry', value_column=\"geometry\"):\n", " \"\"\"Find the nearest point and return the corresponding value from specified value column.\"\"\"\n", " \n", " # Create an union of the other GeoDataFrame's geometries:\n", " other_points = other_gdf[\"geometry\"].unary_union\n", " \n", " # Find the nearest points\n", " nearest_geoms = nearest_points(row[point_column], other_points)\n", " \n", " # Get corresponding values from the other df\n", " nearest_data = other_gdf.loc[other_gdf[\"geometry\"] == nearest_geoms[1]]\n", " \n", " nearest_value = nearest_data[value_column].get_values()[0]\n", " \n", " return nearest_value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, this function returns the geometry of the nearest point for each row. It is also possible to fetch information from other columns by changing the `value_column` parameter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function creates a MultiPoint object from `other_gdf` geometry column (in our case, the address points) and further passes this MultiPoint object to Shapely's `nearest_points` function. \n", "\n", "Here, we are using a method for creating an union of all input geometries called `unary_union`. \n", "\n", "- Let's check how unary union works by applying it to the address points GeoDataFrame:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MULTIPOINT (24.8608434 60.2240225, 24.86128 60.2484905, 24.8710287 60.222498, 24.8769977 60.2397435, 24.8842071 60.2305, 24.8941806817097 60.21721545, 24.9155624 60.1632015, 24.9212065 60.1587845, 24.9216003 60.1566475, 24.9252037 60.1648863, 24.9316914 60.1690222, 24.9331155798105 60.1690911, 24.9338755 60.1995271, 24.9416849 60.1699637, 24.9422931 60.1711382, 24.9470863 60.1719054, 24.9480051 60.2217879, 24.9495338 60.1794339, 24.961156 60.1879465, 24.9656577 60.2298169, 24.9719335 60.1969965, 24.9936217 60.2436491, 25.0068082 60.1887169, 25.0130341 60.2513441, 25.0204879 60.243423, 25.026632061488 60.1944775, 25.0291169 60.2636285, 25.0331561080774 60.2777903, 25.0747841 60.2253109, 25.0783462 60.209819, 25.0817387723606 60.23521665, 25.1069663 60.2391463, 25.1109579 60.2216552, 25.1368632 60.2070291)\n" ] } ], "source": [ "unary_union = df2.unary_union\n", "print(unary_union)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okey now we are ready to use our function and find closest address point for each polygon centroid.\n", " - Try first applying the function without any additional modifications: " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 POINT (24.9155624 60.1632015)\n", "1 POINT (24.8608434 60.2240225)\n", "2 POINT (24.86128 60.2484905)\n", "3 POINT (24.86128 60.2484905)\n", "4 POINT (24.8608434 60.2240225)\n", "5 POINT (24.8608434 60.2240225)\n", "6 POINT (24.8608434 60.2240225)\n", "7 POINT (24.8608434 60.2240225)\n", "8 POINT (24.86128 60.2484905)\n", "9 POINT (24.86128 60.2484905)\n", "10 POINT (24.9216003 60.1566475)\n", "11 POINT (25.0068082 60.1887169)\n", "12 POINT (24.961156 60.1879465)\n", "13 POINT (24.8710287 60.222498)\n", "14 POINT (24.9480051 60.2217879)\n", "15 POINT (25.0204879 60.243423)\n", "16 POINT (24.9656577 60.2298169)\n", "17 POINT (25.0331561080774 60.2777903)\n", "18 POINT (25.0331561080774 60.2777903)\n", "19 POINT (25.1368632 60.2070291)\n", "20 POINT (25.1368632 60.2070291)\n", "21 POINT (25.1069663 60.2391463)\n", "22 POINT (25.0331561080774 60.2777903)\n", "dtype: object" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"nearest_loc\"] = df1.apply(get_nearest_values, other_gdf=df2, point_column=\"centroid\", axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Finally, we can specify that we want the `id` -column for each point, and store the output in a new column `\"nearest_loc\"`:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "df1[\"nearest_loc\"] = df1.apply(get_nearest_values, other_gdf=df2, point_column=\"centroid\", value_column=\"id\", axis=1)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDescriptiongeometrycentroidnearest_loc
0Suur-EspoonlahtiPOLYGON Z ((24.775059677807 60.1090604462157 0...POINT (24.76754037242762 60.0440879200116)1000
1Suur-KauklahtiPOLYGON Z ((24.6157775254076 60.1725681273527 ...POINT (24.57415010885406 60.19764302289445)1020
2Vanha-EspooPOLYGON Z ((24.6757633262026 60.2120070032819 ...POINT (24.60400724339237 60.25253297356344)1017
3Pohjois-EspooPOLYGON Z ((24.767921197401 60.2691954732391 0...POINT (24.68682879841453 60.30649462398335)1017
4Suur-MatinkyläPOLYGON Z ((24.7536131356802 60.1663051341717 ...POINT (24.76063843560942 60.15018263640097)1020
\n", "
" ], "text/plain": [ " Name Description \\\n", "0 Suur-Espoonlahti \n", "1 Suur-Kauklahti \n", "2 Vanha-Espoo \n", "3 Pohjois-Espoo \n", "4 Suur-Matinkylä \n", "\n", " geometry \\\n", "0 POLYGON Z ((24.775059677807 60.1090604462157 0... \n", "1 POLYGON Z ((24.6157775254076 60.1725681273527 ... \n", "2 POLYGON Z ((24.6757633262026 60.2120070032819 ... \n", "3 POLYGON Z ((24.767921197401 60.2691954732391 0... \n", "4 POLYGON Z ((24.7536131356802 60.1663051341717 ... \n", "\n", " centroid nearest_loc \n", "0 POINT (24.76754037242762 60.0440879200116) 1000 \n", "1 POINT (24.57415010885406 60.19764302289445) 1020 \n", "2 POINT (24.60400724339237 60.25253297356344) 1017 \n", "3 POINT (24.68682879841453 60.30649462398335) 1017 \n", "4 POINT (24.76063843560942 60.15018263640097) 1020 " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it! Now we found the closest point for each centroid and got the ``id`` value from our addresses into the ``df1`` GeoDataFrame.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.6" } }, "nbformat": 4, "nbformat_minor": 4 }