{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Nearest Neighbour Analysis\n", "\n", "\n", "One commonly used GIS task is to be able to find the nearest neighbour. For instance, you might have a single Point object\n", "representing your home location, and then another set of locations representing e.g. public transport stops. Then, quite typical question is *\"which of the stops is closest one to my home?\"*\n", "This is a typical nearest neighbour analysis, where the aim is to find the closest geometry to another geometry.\n", "\n", "In Python this kind of analysis can be done with shapely function called ``nearest_points()`` that [returns a tuple of the nearest points in the input geometries](https://shapely.readthedocs.io/en/latest/manual.html#shapely.ops.nearest_points).\n", "\n", "## Nearest point using Shapely\n", "\n", "\n", "Let's start by testing how we can find the nearest Point using the ``nearest_points()`` function of Shapely.\n", "\n", "Let's create an origin Point and a few destination Points and find out the closest destination.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "from shapely.geometry import Point, MultiPoint\n", "from shapely.ops import nearest_points\n", "\n", "orig = Point(1, 1.67)\n", "dest1, dest2, dest3 = Point(0, 1.45), Point(2, 2), Point(0, 2.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To be able to find out the closest destination point from the origin, we need to create a MultiPoint object from the destination points." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MULTIPOINT (0 1.45, 2 2, 0 2.5)\n" ] } ], "source": [ "destinations = MultiPoint([dest1, dest2, dest3])\n", "print(destinations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okey, now we can see that all the destination points are represented as a single MultiPoint object.\n", "\n", "- Now we can find out the nearest destination point by using ``nearest_points()`` function.\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(, )\n", "POINT (1 1.67)\n", "POINT (0 1.45)\n" ] } ], "source": [ "nearest_geoms = nearest_points(orig, destinations)\n", "near_idx0 = nearest_geoms[0]\n", "near_idx1 = nearest_geoms[1]\n", "print(nearest_geoms)\n", "print(near_idx0)\n", "print(near_idx1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see the ``nearest_points()`` function returns a tuple of geometries where the first item is the geometry\n", "of our origin point and the second item (at index 1) is the actual nearest geometry from the destination points.\n", "Hence, the closest destination point seems to be the one located at coordinates (0, 1.45).\n", "\n", "This is the basic logic how we can find the nearest point from a set of points." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nearest points using Geopandas\n", "\n", "Of course, the previous example is not really useful yet. Hence, next I show, how it is possible to find nearest points\n", "from a set of origin points to a set of destination points using GeoDataFrames. Here, we will use the ``PKS_suuralueet.kml`` district data, and the ``addresses.shp`` address points from previous sections. \n", "- First we need to create a function that takes advantage of the previous function but is tailored to work with two GeoDataFrames.\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def nearest(row, geom_union, df1, df2, geom1_col='geometry', geom2_col='geometry', src_column=None):\n", " \"\"\"Find the nearest point and return the corresponding value from specified column.\"\"\"\n", " \n", " # Find the geometry that is closest\n", " nearest = df2[geom2_col] == nearest_points(row[geom1_col], geom_union)[1]\n", " \n", " # Get the corresponding value from df2 (matching is based on the geometry)\n", " value = df2[nearest][src_column].get_values()[0]\n", " \n", " return value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we read the address data and the Helsinki districts data and find out the closest address to the centroid of each district." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Import geopandas\n", "import geopandas as gpd" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# Define filepaths\n", "fp1 = \"data/PKS_suuralue.kml\"\n", "fp2 = \"data/addresses.shp\"" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# Enable KML driver\n", "gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# Read in data with geopandas\n", "df1 = gpd.read_file(fp1, driver='KML')\n", "df2 = gpd.read_file(fp2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create unary union from the address points, which basically creates a MultiPoint object from the Point geometries." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MULTIPOINT (24.8610322 60.224006, 24.8667825 60.2517419, 24.8710287 60.222498, 24.8769977 60.2397435, 24.8838413 60.230578, 24.8941806817097 60.21721545, 24.9155624 60.1632015, 24.9212065 60.1587845, 24.9216003 60.1566475, 24.9252584 60.1648863, 24.9316914 60.1690222, 24.9331155798105 60.1690911, 24.9339225 60.1995792, 24.9416849 60.1699637, 24.9440942536239 60.17130125, 24.9473289 60.1718719, 24.9480051 60.2217879, 24.9495338 60.1794339, 24.9607487 60.1882163, 24.9655307 60.2294746, 24.9655355 60.2008878, 24.9936217 60.2436491, 25.0068082 60.1887169, 25.0130341 60.2513441, 25.0204879 60.243423, 25.026632061488 60.1944775, 25.0291169 60.2636285, 25.0331561080774 60.2777903, 25.0747841 60.2253109, 25.0783462 60.209819, 25.0916737 60.237548, 25.1098071 60.2380653, 25.1108711 60.2217791, 25.1368583 60.2070309)\n" ] } ], "source": [ "unary_union = df2.unary_union\n", "print(unary_union)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate the centroids for each district area." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDescriptiongeometrycentroid
0Suur-EspoonlahtiPOLYGON Z ((24.775059677807 60.1090604462157 0...POINT (24.76754037242762 60.0440879200116)
1Suur-KauklahtiPOLYGON Z ((24.6157775254076 60.1725681273527 ...POINT (24.57415010885406 60.19764302289445)
2Vanha-EspooPOLYGON Z ((24.6757633262026 60.2120070032819 ...POINT (24.60400724339237 60.25253297356344)
3Pohjois-EspooPOLYGON Z ((24.767921197401 60.2691954732391 0...POINT (24.68682879841453 60.30649462398335)
4Suur-MatinkyläPOLYGON Z ((24.7536131356802 60.1663051341717 ...POINT (24.76063843560942 60.15018263640097)
\n", "
" ], "text/plain": [ " Name Description \\\n", "0 Suur-Espoonlahti \n", "1 Suur-Kauklahti \n", "2 Vanha-Espoo \n", "3 Pohjois-Espoo \n", "4 Suur-Matinkylä \n", "\n", " geometry \\\n", "0 POLYGON Z ((24.775059677807 60.1090604462157 0... \n", "1 POLYGON Z ((24.6157775254076 60.1725681273527 ... \n", "2 POLYGON Z ((24.6757633262026 60.2120070032819 ... \n", "3 POLYGON Z ((24.767921197401 60.2691954732391 0... \n", "4 POLYGON Z ((24.7536131356802 60.1663051341717 ... \n", "\n", " centroid \n", "0 POINT (24.76754037242762 60.0440879200116) \n", "1 POINT (24.57415010885406 60.19764302289445) \n", "2 POINT (24.60400724339237 60.25253297356344) \n", "3 POINT (24.68682879841453 60.30649462398335) \n", "4 POINT (24.76063843560942 60.15018263640097) " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1['centroid'] = df1.centroid\n", "df1.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okey now we are ready to use our function and find closest Points (taking the value from id column) from df2 to df1 centroids.\n", "Let's store the id of the nearest address into a new column `\"nearest_id\"` in df1:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDescriptiongeometrycentroidnearest_id
0Suur-EspoonlahtiPOLYGON Z ((24.775059677807 60.1090604462157 0...POINT (24.76754037242762 60.0440879200116)1000
1Suur-KauklahtiPOLYGON Z ((24.6157775254076 60.1725681273527 ...POINT (24.57415010885406 60.19764302289445)1020
2Vanha-EspooPOLYGON Z ((24.6757633262026 60.2120070032819 ...POINT (24.60400724339237 60.25253297356344)1020
3Pohjois-EspooPOLYGON Z ((24.767921197401 60.2691954732391 0...POINT (24.68682879841453 60.30649462398335)1017
4Suur-MatinkyläPOLYGON Z ((24.7536131356802 60.1663051341717 ...POINT (24.76063843560942 60.15018263640097)1020
5KauniainenPOLYGON Z ((24.6907528033566 60.2195779731868 ...POINT (24.71357964516679 60.21457067576294)1020
6Suur-LeppävaaraPOLYGON Z ((24.797472695835 60.2082651196077 0...POINT (24.77910492134015 60.22913609608545)1020
7Suur-TapiolaPOLYGON Z ((24.8443596422129 60.1659790707387 ...POINT (24.79937514852226 60.17816655223976)1020
8MyyrmäkiPOLYGON Z ((24.8245867448802 60.2902531157585 ...POINT (24.81763652589348 60.27819504217397)1017
9KivistöPOLYGON Z ((24.9430919106369 60.3384471629062 ...POINT (24.84180592296876 60.34358057021768)1017
10EteläinenPOLYGON Z ((24.7827651307035 60.09997268858 0,...POINT (24.90837930087519 60.10976339578206)1005
11KaakkoinenPOLYGON Z ((24.8480782099727 60.0275589731893 ...POINT (25.05325169482274 60.05155324331345)1029
12KeskinenPOLYGON Z ((24.9085548098731 60.2082029641503 ...POINT (24.95489633637751 60.20067297308771)1003
13LäntinenPOLYGON Z ((24.832174555671 60.2516121985945 0...POINT (24.87614770011878 60.21754287289237)1010
14PohjoinenPOLYGON Z ((24.8992644865152 60.2689368800439 ...POINT (24.94156264995636 60.24654213027523)1014
15KoillinenPOLYGON Z ((24.9722813313308 60.2432476462193 ...POINT (25.02148999795968 60.25026309396886)1022
16AviapolisPOLYGON Z ((24.9430919106369 60.3384471629062 ...POINT (24.93554952483983 60.30204064147746)1018
17TikkurilaPOLYGON Z ((24.9764047156358 60.2896890295612 ...POINT (25.03931014564627 60.29804037805193)1008
18KoivukyläPOLYGON Z ((24.9942315864552 60.3329637072809 ...POINT (25.05766837333244 60.32582581576258)1008
19ItäinenPOLYGON Z ((25.0351655840904 60.23627484214 0,...POINT (25.12590828372607 60.20923259104367)1024
\n", "
" ], "text/plain": [ " Name Description \\\n", "0 Suur-Espoonlahti \n", "1 Suur-Kauklahti \n", "2 Vanha-Espoo \n", "3 Pohjois-Espoo \n", "4 Suur-Matinkylä \n", "5 Kauniainen \n", "6 Suur-Leppävaara \n", "7 Suur-Tapiola \n", "8 Myyrmäki \n", "9 Kivistö \n", "10 Eteläinen \n", "11 Kaakkoinen \n", "12 Keskinen \n", "13 Läntinen \n", "14 Pohjoinen \n", "15 Koillinen \n", "16 Aviapolis \n", "17 Tikkurila \n", "18 Koivukylä \n", "19 Itäinen \n", "\n", " geometry \\\n", "0 POLYGON Z ((24.775059677807 60.1090604462157 0... \n", "1 POLYGON Z ((24.6157775254076 60.1725681273527 ... \n", "2 POLYGON Z ((24.6757633262026 60.2120070032819 ... \n", "3 POLYGON Z ((24.767921197401 60.2691954732391 0... \n", "4 POLYGON Z ((24.7536131356802 60.1663051341717 ... \n", "5 POLYGON Z ((24.6907528033566 60.2195779731868 ... \n", "6 POLYGON Z ((24.797472695835 60.2082651196077 0... \n", "7 POLYGON Z ((24.8443596422129 60.1659790707387 ... \n", "8 POLYGON Z ((24.8245867448802 60.2902531157585 ... \n", "9 POLYGON Z ((24.9430919106369 60.3384471629062 ... \n", "10 POLYGON Z ((24.7827651307035 60.09997268858 0,... \n", "11 POLYGON Z ((24.8480782099727 60.0275589731893 ... \n", "12 POLYGON Z ((24.9085548098731 60.2082029641503 ... \n", "13 POLYGON Z ((24.832174555671 60.2516121985945 0... \n", "14 POLYGON Z ((24.8992644865152 60.2689368800439 ... \n", "15 POLYGON Z ((24.9722813313308 60.2432476462193 ... \n", "16 POLYGON Z ((24.9430919106369 60.3384471629062 ... \n", "17 POLYGON Z ((24.9764047156358 60.2896890295612 ... \n", "18 POLYGON Z ((24.9942315864552 60.3329637072809 ... \n", "19 POLYGON Z ((25.0351655840904 60.23627484214 0,... \n", "\n", " centroid nearest_id \n", "0 POINT (24.76754037242762 60.0440879200116) 1000 \n", "1 POINT (24.57415010885406 60.19764302289445) 1020 \n", "2 POINT (24.60400724339237 60.25253297356344) 1020 \n", "3 POINT (24.68682879841453 60.30649462398335) 1017 \n", "4 POINT (24.76063843560942 60.15018263640097) 1020 \n", "5 POINT (24.71357964516679 60.21457067576294) 1020 \n", "6 POINT (24.77910492134015 60.22913609608545) 1020 \n", "7 POINT (24.79937514852226 60.17816655223976) 1020 \n", "8 POINT (24.81763652589348 60.27819504217397) 1017 \n", "9 POINT (24.84180592296876 60.34358057021768) 1017 \n", "10 POINT (24.90837930087519 60.10976339578206) 1005 \n", "11 POINT (25.05325169482274 60.05155324331345) 1029 \n", "12 POINT (24.95489633637751 60.20067297308771) 1003 \n", "13 POINT (24.87614770011878 60.21754287289237) 1010 \n", "14 POINT (24.94156264995636 60.24654213027523) 1014 \n", "15 POINT (25.02148999795968 60.25026309396886) 1022 \n", "16 POINT (24.93554952483983 60.30204064147746) 1018 \n", "17 POINT (25.03931014564627 60.29804037805193) 1008 \n", "18 POINT (25.05766837333244 60.32582581576258) 1008 \n", "19 POINT (25.12590828372607 60.20923259104367) 1024 " ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1['nearest_id'] = df1.apply(nearest, geom_union=unary_union, df1=df1, df2=df2, geom1_col='centroid', src_column='id', axis=1)\n", "df1.head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it! Now we found the closest point for each centroid and got the ``id`` value from our addresses into the ``df1`` GeoDataFrame.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }