Geocoding in geopandas#
Geopandas supports geocoding via a library called geopy, which needs to be installed to use geopandas’ ``geopandas.tools.geocode()` function <https://geopandas.org/en/stable/docs/reference/api/geopandas.tools.geocode.html>`__. geocode()
expects a list
or pandas.Series
of addresses (strings) and returns a GeoDataFrame
with resolved addresses and point geometries.
Let’s try this out.
We will geocode addresses stored in a semicolon-separated text file called addresses.txt
. These addresses are located in the Helsinki Region in Southern Finland.
[1]:
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"
[2]:
import pandas
addresses = pandas.read_csv(
DATA_DIRECTORY / "helsinki_addresses" / "addresses.txt",
sep=";"
)
addresses.head()
[2]:
id | addr | |
---|---|---|
0 | 1000 | Itämerenkatu 14, 00101 Helsinki, Finland |
1 | 1001 | Kampinkuja 1, 00100 Helsinki, Finland |
2 | 1002 | Kaivokatu 8, 00101 Helsinki, Finland |
3 | 1003 | Hermannin rantatie 1, 00580 Helsinki, Finland |
4 | 1005 | Tyynenmerenkatu 9, 00220 Helsinki, Finland |
We have an id
for each row and an address in the addr
column.
Geocode addresses using Nominatim#
In our example, we will use Nominatim as a geocoding provider. Nominatim is a library and service using OpenStreetMap data, and run by the OpenStreetMap Foundation. Geopandas’ `geocode()
function <hhttps://geopandas.org/en/stable/docs/reference/api/geopandas.tools.geocode.html>`__ supports it natively.
Fair-use
Nominatim’s terms of use require that users of the service ensure they don’t send more frequent requests than one per second and that a custom user-agent string is attached to each query.
Geopandas’ implementation allows us to specify a user_agent
, and the library also takes care of respecting Nominatim’s rate limit.
Looking up an address is a quite expensive database operation. This is why the public and free-to-use Nominatim server sometimes takes slightly longer to respond. In this example, we add a parameter timeout=10
to wait up to 10 seconds for a response.
[3]:
import geopandas
geocoded_addresses = geopandas.tools.geocode(
addresses["addr"],
provider="nominatim",
user_agent="autogis2023",
timeout=10
)
geocoded_addresses.head()
[3]:
geometry | address | |
---|---|---|
0 | POINT (24.91556 60.1632) | Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns... |
1 | POINT (24.93009 60.16846) | 1, Kampinkuja, Kamppi, Eteläinen suurpiiri, He... |
2 | POINT (24.94153 60.17016) | Espresso House, 8, Kaivokatu, Keskusta, Kluuvi... |
3 | POINT (24.97675 60.19438) | Hermannin rantatie, Hermanninranta, Hermanni, ... |
4 | POINT (24.92151 60.15662) | 9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... |
Et voilà! As a result we received a GeoDataFrame
that contains a parsed version of our original addresses and a geometry
column of shapely.geometry.Point
s that we can use, for instance, to export the data to a geospatial data format.
However, the id
column was discarded in the process. To combine the input data set with our result set, we can use pandas’ join operations.
Join data frames#
Note: Joining data sets using pandas
For a comprehensive overview of different ways of combining DataFrames and Series based on set theory, see the pandas documentation on merge, join, and concatenate.
Joining data from two or more data frames or tables is a common task in many (spatial) data analysis workflows. As you might remember from our earlier lessons, combining data from different tables based on common key attribute can be done easily in pandas/geopandas using the `merge()
function <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html>`__. We used this approach in exercise 6 of the Geo-Python
course.
However, sometimes it is useful to join two data frames together based on their index. The data frames have to have the same number of records and share the same index (simply put, they should have the same order of rows).
We can use this approach, here, to join information from the original data frame addresses
to the geocoded addresses geocoded_addresses
, row by row. The join()
function, by default, joins two data frames based on their index. This works correctly for our example, as the order of the two data frames is identical.
[4]:
geocoded_addresses_with_id = geocoded_addresses.join(addresses)
geocoded_addresses_with_id
[4]:
geometry | address | id | addr | |
---|---|---|---|---|
0 | POINT (24.91556 60.1632) | Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns... | 1000 | Itämerenkatu 14, 00101 Helsinki, Finland |
1 | POINT (24.93009 60.16846) | 1, Kampinkuja, Kamppi, Eteläinen suurpiiri, He... | 1001 | Kampinkuja 1, 00100 Helsinki, Finland |
2 | POINT (24.94153 60.17016) | Espresso House, 8, Kaivokatu, Keskusta, Kluuvi... | 1002 | Kaivokatu 8, 00101 Helsinki, Finland |
3 | POINT (24.97675 60.19438) | Hermannin rantatie, Hermanninranta, Hermanni, ... | 1003 | Hermannin rantatie 1, 00580 Helsinki, Finland |
4 | POINT (24.92151 60.15662) | 9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... | 1005 | Tyynenmerenkatu 9, 00220 Helsinki, Finland |
5 | POINT (25.08174 60.23522) | 18, Kontulantie, Kontula, Mellunkylä, Itäinen ... | 1006 | Kontulantie 18, 00940 Helsinki, Finland |
6 | POINT (25.10278 60.21788) | Itäväylä, Vartioharju, Vartiokylä, Itäinen suu... | 1007 | Itäväylä 3, 00950 Helsinki, Finland |
7 | POINT (25.03509 60.27577) | Tapulikaupungintie, Tapulikaupunki, Suutarila,... | 1008 | Tapulikaupungintie 3, 00750 Helsinki, Finland |
8 | POINT (25.02883 60.26367) | Sompionpolku, Fallkullan kiila, Tapanila, Tapa... | 1009 | Sompionpolku 2, 00730 Helsinki, Finland |
9 | POINT (24.87197 60.22244) | 5, Atomitie, Strömberg, Pitäjänmäen teollisuus... | 1010 | Atomitie 5, 00370 Helsinki, Finland |
10 | POINT (24.94263 60.1709) | Rautatientori, Keskusta, Kluuvi, Eteläinen suu... | 1011 | Rautatientori 1, 00100 Helsinki, Finland |
11 | POINT (24.88282 60.2308) | Kuparitie, Lassila, Haaga, Läntinen suurpiiri,... | 1012 | Kuparitie 8, 00440 Helsinki, Finland |
12 | POINT (24.877 60.23975) | Rumpupolku, Kannelmäki, Kaarela, Läntinen suur... | 1013 | Rumpupolku 8, 00420 Helsinki, Finland |
13 | POINT (24.94801 60.22179) | K-Supermarket, 1, Mäkitorpantie, Patola, Oulun... | 1014 | Mäkitorpantie 1, 00620 Helsinki, Finland |
14 | POINT (25.01295 60.25107) | Yliopiston Apteekki, 15, Malminkaari, Ala-Malm... | 1015 | Malminkaari 15, 00700 Helsinki, Finland |
15 | POINT (24.89418 60.21722) | 23, Kylätie, Etelä-Haaga, Haaga, Läntinen suur... | 1016 | Kylätie 23, 00320 Helsinki, Finland |
16 | POINT (24.86124 60.24848) | Malminkartanontie, Malminkartano, Kaarela, Län... | 1017 | Malminkartanontie 17, 00410 Helsinki, Finland |
17 | POINT (24.96564 60.22983) | Oulunkylän tori, Patola, Oulunkylä, Pohjoinen ... | 1018 | Oulunkylän tori 2b, 00640 Helsinki, Finland |
18 | POINT (24.93435 60.19857) | 6, Ratapihantie, Itä-Pasila, Pasila, Keskinen ... | 1019 | Ratapihantie 6, 00101 Helsinki, Finland |
19 | POINT (24.86086 60.22407) | 15, Pitäjänmäentie, Reimarla, Pitäjänmäki, Län... | 1020 | Pitäjänmäentie 15, 00370 Helsinki, Finland |
20 | POINT (24.99376 60.2438) | 2, Eskolantie, Savela, Pukinmäki, Koillinen su... | 1021 | Eskolantie 2, 00720 Helsinki, Finland |
21 | POINT (25.0325 60.24329) | Tattariharjuntie, Sepänmäki, Ala-Malmi, Malmi,... | 1022 | Tattariharjuntie, 00700 Helsinki, Finland |
22 | POINT (25.07842 60.20984) | Otto. pankkiautomaatti, 1, Tallinnanaukio, Itä... | 1023 | Tallinnanaukio 1, 00930 Helsinki, Finland |
23 | POINT (25.1359 60.20718) | Tyynylaavantie, Keski-Vuosaari, Vuosaari, Itäi... | 1024 | Tyynylaavantie 7, 00980 Helsinki, Finland |
24 | POINT (25.07643 60.22424) | 5, Myllypurontie, Myllypuro, Vartiokylä, Itäin... | 1025 | Myllypurontie 5, 00920 Helsinki, Finland |
25 | POINT (25.10975 60.23768) | Mellunmäenraitio, Mellunmäki, Mellunkylä, Itäi... | 1026 | Mellunmäenraitio 6, 00970 Helsinki, Finland |
26 | POINT (24.96069 60.18824) | Vaasanpolku, Kurvi, Harju, Alppiharju, Keskine... | 1027 | Vaasanpolku 2, 00101 Helsinki, Finland |
27 | POINT (25.02832 60.19442) | Alko, 2, Hiihtäjäntie, Länsi-Herttoniemi, Hert... | 1028 | Hiihtäjäntie 2, 00810 Helsinki, Finland |
28 | POINT (25.00676 60.18871) | Metro Kulosaari, 2, Ukko-Pekan porras, Kulosaa... | 1029 | Ukko-Pekan porras 2, 00570 Helsinki, Finland |
29 | POINT (24.94958 60.17943) | 16, Siltasaarenkatu, Siltasaari, Kallio, Keski... | 1030 | Siltasaarenkatu 16, 00530 Helsinki, Finland |
30 | POINT (24.93307 60.16908) | Kampin keskus, 1, Urho Kekkosen katu, Kamppi, ... | 1031 | Urho Kekkosen katu 1, 00100 Helsinki, Finland |
31 | POINT (24.93031 60.16642) | Ruoholahdenkatu, Kamppi, Eteläinen suurpiiri, ... | 1032 | Ruoholahdenkatu 17, 00101 Helsinki, Finland |
32 | POINT (24.92121 60.15878) | 3, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... | 1033 | Tyynenmerenkatu 3, 00220 Helsinki, Finland |
33 | POINT (24.94683 60.172) | Old Tea Shop, 4, Vilhonkatu, Kaisaniemi, Kluuv... | 1034 | Vilhonkatu 4, 00101 Helsinki, Finland |
The output of join()
is a new geopandas.GeoDataFrame
:
[5]:
type(geocoded_addresses_with_id)
[5]:
geopandas.geodataframe.GeoDataFrame
The new data frame has all original columns plus new columns for the geometry
and for a parsed address
that can be used to spot-check the results.
Note
If you perform the join the other way around, i.e.,
addresses.join(geocoded_addresses)
, the output would be apandas.DataFrame
, not ageopandas.GeoDataFrame
.
It’s now easy to save the new data set as a geospatial file, for instance, in GeoPackage format:
[6]:
# delete a possibly existing file, as it creates
# troubles in case sphinx is run repeatedly
try:
(DATA_DIRECTORY / "addresses.gpkg").unlink()
except FileNotFoundError:
pass
[7]:
geocoded_addresses.to_file(DATA_DIRECTORY / "addresses.gpkg")
Understanding the difference between ``join`` and ``merge`` in GeoPandas
GeoPandas provides both join
and merge
functions, and while they may seem similar, they are used differently depending on the context.
join
:This is primarily used for joining GeoDataFrames with a shared index. It works similarly to a SQL join based on the index of the two tables.
It is ideal for adding columns from one GeoDataFrame to another based on the index or a pre-aligned structure.
merge
:merge
allows more flexibility by enabling joins based on specific columns, not just the index. It works similarly topd.merge
in pandas.It is useful for spatial joins when you want to match features based on attribute values in specific columns rather than just the index.
Example#
import geopandas as gpd
# Sample GeoDataFrames
gdf1 = gpd.GeoDataFrame({
'ID': [1, 2, 3],
'Name': ['Park', 'Lake', 'Forest'],
'geometry': gpd.points_from_xy([10, 20, 30], [10, 20, 30])
})
gdf2 = gpd.GeoDataFrame({
'ID': [1, 2, 3],
'Area_km2': [1.5, 2.1, 3.3]
})
# Using `join` - joins based on index
joined = gdf1.set_index('ID').join(gdf2.set_index('ID'))
print("Using `join`:\n", joined)
# Using `merge` - joins based on a column
merged = gdf1.merge(gdf2, on='ID')
print("Using `merge`:\n", merged)
[ ]: