Geocoding in geopandas#

Geopandas supports geocoding via a library called geopy, which needs to be installed to use geopandas’ ``geopandas.tools.geocode()` function <https://geopandas.org/en/stable/docs/reference/api/geopandas.tools.geocode.html>`__. geocode() expects a list or pandas.Series of addresses (strings) and returns a GeoDataFrame with resolved addresses and point geometries.

Let’s try this out.

We will geocode addresses stored in a semicolon-separated text file called addresses.txt. These addresses are located in the Helsinki Region in Southern Finland.

[1]:
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"
[2]:
import pandas
addresses = pandas.read_csv(
    DATA_DIRECTORY / "helsinki_addresses" / "addresses.txt",
    sep=";"
)

addresses.head()
[2]:
id addr
0 1000 Itämerenkatu 14, 00101 Helsinki, Finland
1 1001 Kampinkuja 1, 00100 Helsinki, Finland
2 1002 Kaivokatu 8, 00101 Helsinki, Finland
3 1003 Hermannin rantatie 1, 00580 Helsinki, Finland
4 1005 Tyynenmerenkatu 9, 00220 Helsinki, Finland

We have an id for each row and an address in the addr column.

Geocode addresses using Nominatim#

In our example, we will use Nominatim as a geocoding provider. Nominatim is a library and service using OpenStreetMap data, and run by the OpenStreetMap Foundation. Geopandas’ `geocode() function <hhttps://geopandas.org/en/stable/docs/reference/api/geopandas.tools.geocode.html>`__ supports it natively.

Fair-use

Nominatim’s terms of use require that users of the service ensure they don’t send more frequent requests than one per second and that a custom user-agent string is attached to each query.

Geopandas’ implementation allows us to specify a user_agent, and the library also takes care of respecting Nominatim’s rate limit.

Looking up an address is a quite expensive database operation. This is why the public and free-to-use Nominatim server sometimes takes slightly longer to respond. In this example, we add a parameter timeout=10 to wait up to 10 seconds for a response.

[3]:
import geopandas

geocoded_addresses = geopandas.tools.geocode(
    addresses["addr"],
    provider="nominatim",
    user_agent="autogis2023",
    timeout=10
)
geocoded_addresses.head()
[3]:
geometry address
0 POINT (24.91556 60.1632) Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns...
1 POINT (24.93009 60.16846) 1, Kampinkuja, Kamppi, Eteläinen suurpiiri, He...
2 POINT (24.94153 60.17016) Espresso House, 8, Kaivokatu, Keskusta, Kluuvi...
3 POINT (24.97675 60.19438) Hermannin rantatie, Hermanninranta, Hermanni, ...
4 POINT (24.92151 60.15662) 9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E...

Et voilà! As a result we received a GeoDataFrame that contains a parsed version of our original addresses and a geometry column of shapely.geometry.Points that we can use, for instance, to export the data to a geospatial data format.

However, the id column was discarded in the process. To combine the input data set with our result set, we can use pandas’ join operations.

Join data frames#

Note: Joining data sets using pandas

For a comprehensive overview of different ways of combining DataFrames and Series based on set theory, see the pandas documentation on merge, join, and concatenate.

Joining data from two or more data frames or tables is a common task in many (spatial) data analysis workflows. As you might remember from our earlier lessons, combining data from different tables based on common key attribute can be done easily in pandas/geopandas using the `merge() function <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html>`__. We used this approach in exercise 6 of the Geo-Python course.

However, sometimes it is useful to join two data frames together based on their index. The data frames have to have the same number of records and share the same index (simply put, they should have the same order of rows).

We can use this approach, here, to join information from the original data frame addresses to the geocoded addresses geocoded_addresses, row by row. The join() function, by default, joins two data frames based on their index. This works correctly for our example, as the order of the two data frames is identical.

[4]:
geocoded_addresses_with_id = geocoded_addresses.join(addresses)
geocoded_addresses_with_id
[4]:
geometry address id addr
0 POINT (24.91556 60.1632) Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns... 1000 Itämerenkatu 14, 00101 Helsinki, Finland
1 POINT (24.93009 60.16846) 1, Kampinkuja, Kamppi, Eteläinen suurpiiri, He... 1001 Kampinkuja 1, 00100 Helsinki, Finland
2 POINT (24.94153 60.17016) Espresso House, 8, Kaivokatu, Keskusta, Kluuvi... 1002 Kaivokatu 8, 00101 Helsinki, Finland
3 POINT (24.97675 60.19438) Hermannin rantatie, Hermanninranta, Hermanni, ... 1003 Hermannin rantatie 1, 00580 Helsinki, Finland
4 POINT (24.92151 60.15662) 9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... 1005 Tyynenmerenkatu 9, 00220 Helsinki, Finland
5 POINT (25.08174 60.23522) 18, Kontulantie, Kontula, Mellunkylä, Itäinen ... 1006 Kontulantie 18, 00940 Helsinki, Finland
6 POINT (25.10278 60.21788) Itäväylä, Vartioharju, Vartiokylä, Itäinen suu... 1007 Itäväylä 3, 00950 Helsinki, Finland
7 POINT (25.03509 60.27577) Tapulikaupungintie, Tapulikaupunki, Suutarila,... 1008 Tapulikaupungintie 3, 00750 Helsinki, Finland
8 POINT (25.02883 60.26367) Sompionpolku, Fallkullan kiila, Tapanila, Tapa... 1009 Sompionpolku 2, 00730 Helsinki, Finland
9 POINT (24.87197 60.22244) 5, Atomitie, Strömberg, Pitäjänmäen teollisuus... 1010 Atomitie 5, 00370 Helsinki, Finland
10 POINT (24.94263 60.1709) Rautatientori, Keskusta, Kluuvi, Eteläinen suu... 1011 Rautatientori 1, 00100 Helsinki, Finland
11 POINT (24.88282 60.2308) Kuparitie, Lassila, Haaga, Läntinen suurpiiri,... 1012 Kuparitie 8, 00440 Helsinki, Finland
12 POINT (24.877 60.23975) Rumpupolku, Kannelmäki, Kaarela, Läntinen suur... 1013 Rumpupolku 8, 00420 Helsinki, Finland
13 POINT (24.94801 60.22179) K-Supermarket, 1, Mäkitorpantie, Patola, Oulun... 1014 Mäkitorpantie 1, 00620 Helsinki, Finland
14 POINT (25.01295 60.25107) Yliopiston Apteekki, 15, Malminkaari, Ala-Malm... 1015 Malminkaari 15, 00700 Helsinki, Finland
15 POINT (24.89418 60.21722) 23, Kylätie, Etelä-Haaga, Haaga, Läntinen suur... 1016 Kylätie 23, 00320 Helsinki, Finland
16 POINT (24.86124 60.24848) Malminkartanontie, Malminkartano, Kaarela, Län... 1017 Malminkartanontie 17, 00410 Helsinki, Finland
17 POINT (24.96564 60.22983) Oulunkylän tori, Patola, Oulunkylä, Pohjoinen ... 1018 Oulunkylän tori 2b, 00640 Helsinki, Finland
18 POINT (24.93435 60.19857) 6, Ratapihantie, Itä-Pasila, Pasila, Keskinen ... 1019 Ratapihantie 6, 00101 Helsinki, Finland
19 POINT (24.86086 60.22407) 15, Pitäjänmäentie, Reimarla, Pitäjänmäki, Län... 1020 Pitäjänmäentie 15, 00370 Helsinki, Finland
20 POINT (24.99376 60.2438) 2, Eskolantie, Savela, Pukinmäki, Koillinen su... 1021 Eskolantie 2, 00720 Helsinki, Finland
21 POINT (25.0325 60.24329) Tattariharjuntie, Sepänmäki, Ala-Malmi, Malmi,... 1022 Tattariharjuntie, 00700 Helsinki, Finland
22 POINT (25.07842 60.20984) Otto. pankkiautomaatti, 1, Tallinnanaukio, Itä... 1023 Tallinnanaukio 1, 00930 Helsinki, Finland
23 POINT (25.1359 60.20718) Tyynylaavantie, Keski-Vuosaari, Vuosaari, Itäi... 1024 Tyynylaavantie 7, 00980 Helsinki, Finland
24 POINT (25.07643 60.22424) 5, Myllypurontie, Myllypuro, Vartiokylä, Itäin... 1025 Myllypurontie 5, 00920 Helsinki, Finland
25 POINT (25.10975 60.23768) Mellunmäenraitio, Mellunmäki, Mellunkylä, Itäi... 1026 Mellunmäenraitio 6, 00970 Helsinki, Finland
26 POINT (24.96069 60.18824) Vaasanpolku, Kurvi, Harju, Alppiharju, Keskine... 1027 Vaasanpolku 2, 00101 Helsinki, Finland
27 POINT (25.02832 60.19442) Alko, 2, Hiihtäjäntie, Länsi-Herttoniemi, Hert... 1028 Hiihtäjäntie 2, 00810 Helsinki, Finland
28 POINT (25.00676 60.18871) Metro Kulosaari, 2, Ukko-Pekan porras, Kulosaa... 1029 Ukko-Pekan porras 2, 00570 Helsinki, Finland
29 POINT (24.94958 60.17943) 16, Siltasaarenkatu, Siltasaari, Kallio, Keski... 1030 Siltasaarenkatu 16, 00530 Helsinki, Finland
30 POINT (24.93307 60.16908) Kampin keskus, 1, Urho Kekkosen katu, Kamppi, ... 1031 Urho Kekkosen katu 1, 00100 Helsinki, Finland
31 POINT (24.93031 60.16642) Ruoholahdenkatu, Kamppi, Eteläinen suurpiiri, ... 1032 Ruoholahdenkatu 17, 00101 Helsinki, Finland
32 POINT (24.92121 60.15878) 3, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... 1033 Tyynenmerenkatu 3, 00220 Helsinki, Finland
33 POINT (24.94683 60.172) Old Tea Shop, 4, Vilhonkatu, Kaisaniemi, Kluuv... 1034 Vilhonkatu 4, 00101 Helsinki, Finland

The output of join() is a new geopandas.GeoDataFrame:

[5]:
type(geocoded_addresses_with_id)
[5]:
geopandas.geodataframe.GeoDataFrame

The new data frame has all original columns plus new columns for the geometry and for a parsed address that can be used to spot-check the results.

Note

If you perform the join the other way around, i.e., addresses.join(geocoded_addresses), the output would be a pandas.DataFrame, not a geopandas.GeoDataFrame.


It’s now easy to save the new data set as a geospatial file, for instance, in GeoPackage format:

[6]:
# delete a possibly existing file, as it creates
# troubles in case sphinx is run repeatedly
try:
    (DATA_DIRECTORY / "addresses.gpkg").unlink()
except FileNotFoundError:
    pass
[7]:
geocoded_addresses.to_file(DATA_DIRECTORY / "addresses.gpkg")

Understanding the difference between ``join`` and ``merge`` in GeoPandas

GeoPandas provides both join and merge functions, and while they may seem similar, they are used differently depending on the context.

  1. join:

    • This is primarily used for joining GeoDataFrames with a shared index. It works similarly to a SQL join based on the index of the two tables.

    • It is ideal for adding columns from one GeoDataFrame to another based on the index or a pre-aligned structure.

  2. merge:

    • merge allows more flexibility by enabling joins based on specific columns, not just the index. It works similarly to pd.merge in pandas.

    • It is useful for spatial joins when you want to match features based on attribute values in specific columns rather than just the index.

Example#

import geopandas as gpd

# Sample GeoDataFrames
gdf1 = gpd.GeoDataFrame({
    'ID': [1, 2, 3],
    'Name': ['Park', 'Lake', 'Forest'],
    'geometry': gpd.points_from_xy([10, 20, 30], [10, 20, 30])
})

gdf2 = gpd.GeoDataFrame({
    'ID': [1, 2, 3],
    'Area_km2': [1.5, 2.1, 3.3]
})

# Using `join` - joins based on index
joined = gdf1.set_index('ID').join(gdf2.set_index('ID'))
print("Using `join`:\n", joined)

# Using `merge` - joins based on a column
merged = gdf1.merge(gdf2, on='ID')
print("Using `merge`:\n", merged)
[ ]: