Point-in-polygon queries#

Finding out if a certain point is located inside or outside of an area, or finding out if a line intersects with another line or polygon are fundamental geospatial operations that are often used e.g. to select data based on location. Such spatial queries are one of the typical first steps of the workflow when doing spatial analysis. Performing a spatial join (will be introduced later) between two spatial datasets is one of the most typical applications where Point in Polygon (PIP) query is used.

For further reading about PIP and other geometric operations, see Chapter 4.2 in Smith, Goodchild & Longley: Geospatial Analysis - 6th edition.

How to check if point is inside a polygon?#

Computationally, detecting if a point is inside a polygon is most commonly done using a specific formula called Ray Casting algorithm. Luckily, we do not need to create such a function ourselves for conducting the Point in Polygon (PIP) query. Instead, we can take advantage of Shapely’s binary predicates that can evaluate the topolocical relationships between geographical objects, such as the PIP as we’re interested here.

Good to know: Ray-Casting Algorithm

The ray-casting algorithm is a common method used to determine whether a given point lies inside or outside of a polygon. This algorithm is widely used in spatial analysis and GIS.

How It Works

  • Cast a Ray: A ray (straight line) is extended from the point in question to infinity in one direction.

  • Count Intersections: Count how many times this ray intersects with the edges of the polygon.

    • If the ray crosses the edges an odd number of times, the point is considered to be inside the polygon.

    • If it crosses an edge an even number of times, the point is outside the polygon.

Example of ray casting of a polygon

Example of ray casting of a polygon. Image fromVacek et al., 2017. #### Why This Works This method leverages the idea that a ray entering and exiting the polygon will cross an edge each time it enters or exits. An odd count of intersections means the point is “trapped” inside; an even count means it is outside.

Use Cases and Limitations

  • This method works well for both convex and concave polygons.

  • Special cases like points on the polygon edge may need additional handling.

Test your understanding

Can you write a Python code to implement this algrothm?

Point-in-polygon queries on shapely geometries#

There are basically two ways of conducting PIP in Shapely:

  1. using a function called within() that checks if a point is within a polygon

  2. using a function called contains() that checks if a polygon contains a point

Note

Even though we are discussing a Point in Polygon operation, it is also possible to check if a LineString or Polygon is inside another Polygon.

Let’s first create a couple of point geometries:

[1]:
import shapely.geometry
point1 = shapely.geometry.Point(24.952242, 60.1696017)
point2 = shapely.geometry.Point(24.976567, 60.1612500)

… and a polygon:

[2]:
polygon = shapely.geometry.Polygon(
    [
        (24.950899, 60.169158),
        (24.953492, 60.169158),
        (24.953510, 60.170104),
        (24.950958, 60.169990)
    ]
)
[3]:
print(point1)
print(point2)
print(polygon)
POINT (24.952242 60.1696017)
POINT (24.976567 60.16125)
POLYGON ((24.950899 60.169158, 24.953492 60.169158, 24.95351 60.170104, 24.950958 60.16999, 24.950899 60.169158))

Let’s check if the points are within() the polygon:

[4]:
point1.within(polygon)
[4]:
True
[5]:
point2.within(polygon)
[5]:
False

It seems that the first point is inside the polygon, but the second one is not.

We can turn the logic of the look-up around: Rather than check of the point is within the polygon, we can also ask whether the polygon contains() the point:

[6]:
polygon.contains(point1)
[6]:
True
[7]:
polygon.contains(point2)
[7]:
False

Hint: The two ways of checking the spatial relationship are complementary and yield equivalent results; contains() is inverse to within(), and vice versa.

Then, which one should you use? Well, it depends:

  • If you have many points and just one polygon and want to find out which points are inside the polygon, you might need to iterate over the points and check one at a time if it is within() the polygon.

  • If you have many polygons and just one point and want to find out which polygon contains the point, you might need to iterate over the polygons until you find one that contains() the point.

Point-in-polygon queries on geopandas.GeoDataFrames#

In the following practical example we find which of the addresses we obtained in the geocoding section are located within a certain city district of Helsinki.

The data set we are using is from Helsinki Region Infoshare, and licensed under a Creative-Commons-Attribution-4.0 license.

[8]:
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"
[9]:
import geopandas

city_districts = geopandas.read_file(
    DATA_DIRECTORY / "helsinki_city_districts" / "helsinki_city_districts_2021.gpkg"
)
city_districts.head()
[9]:
name geometry
0 Eteläinen POLYGON ((24.7828 60.09996, 24.80437 60.07607,...
1 Läntinen POLYGON ((24.8314 60.25406, 24.83168 60.25321,...
2 Keskinen POLYGON ((24.93345 60.18317, 24.93502 60.18005...
3 Pohjoinen POLYGON ((24.90081 60.23526, 24.89944 60.235, ...
4 Koillinen POLYGON ((24.97163 60.24253, 24.97163 60.24246...
[10]:
city_districts.plot()
[10]:
<Axes: >
../../_images/lessons_lesson-3_point-in-polygon-queries_15_1.png

Specifically, we want to find out which points are within the ‘Eteläinen’ (‘southern’) city district. Let’s start by obtaining a separate data set for this district, loading the addresses data, and plotting a multi-layer map that shows all districts, the ‘Eteläinen’ district, and all the points in one map:

[11]:
southern_district = city_districts[city_districts.name == "Eteläinen"]
southern_district
[11]:
name geometry
0 Eteläinen POLYGON ((24.7828 60.09996, 24.80437 60.07607,...
[12]:
addresses = geopandas.read_file(DATA_DIRECTORY / "addresses.gpkg")

Hint: Plotting multiple map layers

To plot several map layers in one figure, use the ax parameter to specify in which axes data should be plotted. This was used in lesson 7 of Geo-Python to add text to a plot or modify axes properties.

The easiest way to obtain an axes is to save the return value of the first plot() (see below). Another option is to create subplots(), possibly with only one row and one column.

[13]:
axes = city_districts.plot(facecolor="grey")
southern_district.plot(ax=axes, facecolor="red")
addresses.plot(ax=axes, color="blue", markersize=5)
[13]:
<Axes: >
../../_images/lessons_lesson-3_point-in-polygon-queries_20_1.png

Some points are within the ‘Eteläinen’ district, but others are not. To find out which are the ones inside the district, we can use a point-in-polygon query, this time on the entire geopandas.GeoDataFrame. Its method within() returns Boolean (True/False) values that indicate whether or not a row’s geometry is contained in the supplied other geometry:

Geometry vs. geometry column

In the example below, we use southern.at[0, "geometry"] to obtain a single value, a shapely.geometry.Polygon, instead of an entire column (a GeoSeries). This ensures that each row’s geometry in the addresses data frame is matched against the same polygon. In contrast, if we ran within() against a column, the operation would be carried out row-wise: the first address point would be checked against the first polygon, the second address point against the second polygon, and so forth.

Check the documentation for ``within()` <https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.within.html>`__ to learn more!

[14]:
addresses.within(southern_district.at[0, "geometry"])
[14]:
0      True
1      True
2      True
3     False
4      True
5     False
6     False
7     False
8     False
9     False
10     True
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30     True
31     True
32     True
33     True
dtype: bool

This list of Boolean values, also called a mask array can be used to filter the input data frame:

[15]:
addresses_in_the_southern_district = addresses[
    addresses.within(southern_district.at[0, "geometry"])
]
addresses_in_the_southern_district
[15]:
address geometry
0 Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns... POINT (24.91556 60.1632)
1 1, Kampinkuja, Kamppi, Eteläinen suurpiiri, He... POINT (24.93009 60.16846)
2 Espresso House, 8, Kaivokatu, Keskusta, Kluuvi... POINT (24.94153 60.17016)
4 9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... POINT (24.92151 60.15662)
10 Rautatientori, Keskusta, Kluuvi, Eteläinen suu... POINT (24.94263 60.1709)
30 Kampin keskus, 1, Urho Kekkosen katu, Kamppi, ... POINT (24.93307 60.16908)
31 Ruoholahdenkatu, Kamppi, Eteläinen suurpiiri, ... POINT (24.93031 60.16642)
32 3, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... POINT (24.92121 60.15878)
33 Old Tea Shop, 4, Vilhonkatu, Kaisaniemi, Kluuv... POINT (24.94683 60.172)

Finally, let’s plot this list of addresses one more time to visually verify that all of them, indeed, are located within the ‘Eteläinen’ city district:

[16]:
axes = city_districts.plot(facecolor="grey")
southern_district.plot(ax=axes, facecolor="red")

addresses_in_the_southern_district.plot(
    ax=axes,
    color="gold",
    markersize=5
)
[16]:
<Axes: >
../../_images/lessons_lesson-3_point-in-polygon-queries_26_1.png

Perfect! Now we are left with only the (golden) points which, indeed, are inside the red polygon. That’s exactly what we wanted!