Geocoding in geopandas
Contents
Geocoding in geopandas#
Geopandas supports geocoding via a library called
geopy, which needs to be installed to use
geopandas’ geopandas.tools.geocode()
function.
geocode()
expects a list
or pandas.Series
of addresses (strings) and
returns a GeoDataFrame
with resolved addresses and point geometries.
Let’s try this out.
We will geocode addresses stored in a semicolon-separated text file called
addresses.txt
. These addresses are located in the Helsinki Region in Southern
Finland.
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"
import pandas
addresses = pandas.read_csv(
DATA_DIRECTORY / "helsinki_addresses" / "addresses.txt",
sep=";"
)
addresses.head()
id | addr | |
---|---|---|
0 | 1000 | Itämerenkatu 14, 00101 Helsinki, Finland |
1 | 1001 | Kampinkuja 1, 00100 Helsinki, Finland |
2 | 1002 | Kaivokatu 8, 00101 Helsinki, Finland |
3 | 1003 | Hermannin rantatie 1, 00580 Helsinki, Finland |
4 | 1005 | Tyynenmerenkatu 9, 00220 Helsinki, Finland |
We have an id
for each row and an address in the addr
column.
Geocode addresses using Nominatim#
In our example, we will use Nominatim as a geocoding provider. Nominatim is a library and service using OpenStreetMap data, and run by the OpenStreetMap Foundation. Geopandas’
geocode()
function supports it natively.
Fair-use
Nominatim’s terms of use require that users of the service make sure they don’t send more frequent requests than one per second, and that a custom user-agent string is attached to each query.
Geopandas’ implementation allows us to specify a user_agent
; the library also
takes care of respecting the rate-limit of Nominatim.
Looking up an address is a quite expensive database operation. This is why,
sometimes, the public and free-to-use Nominatim server takes slightly longer to
respond. In this example, we add a parameter timeout=10
to wait up to 10
seconds for a response.
import geopandas
geocoded_addresses = geopandas.tools.geocode(
addresses["addr"],
provider="nominatim",
user_agent="autogis2023",
timeout=10
)
geocoded_addresses.head()
geometry | address | |
---|---|---|
0 | POINT (24.91556 60.16320) | Ruoholahti, 14, Itämerenkatu, Salmisaari, Ruoh... |
1 | POINT (24.93166 60.16905) | Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp... |
2 | POINT (24.94179 60.16989) | Kauppakeskus Citycenter, 8, Kaivokatu, Keskust... |
3 | POINT (24.97846 60.19206) | Hermannin rantatie, Verkkosaari, Kalasatama, S... |
4 | POINT (24.92151 60.15662) | 9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... |
Et voilà! As a result we received a GeoDataFrame
that contains a parsed
version of our original addresses and a geometry
column of
shapely.geometry.Point
s that we can use, for instance, to export the data to
a geospatial data format.
However, the id
column was discarded in the process. To combine the input
data set with our result set, we can use pandas’ join
operations.
Join data frames#
Joining data sets using pandas
For a comprehensive overview of different ways of combining DataFrames and Series based on set theory, have a look at pandas documentation about merge, join and concatenate.
Joining data from two or more data frames or tables is a common task in many
(spatial) data analysis workflows. As you might remember from our earlier
lessons, combining data from different tables based on common key attribute
can be done easily in pandas/geopandas using the merge()
function.
We used this approach in exercise 6 of the Geo-Python
course.
However, sometimes it is useful to join two data frames together based on their index. The data frames have to have the same number of records and share the same index (simply put, they should have the same order of rows).
We can use this approach, here, to join information from the original data
frame addresses
to the geocoded addresses geocoded_addresses
, row by row.
The join()
function, by default, joins two data frames based on their index.
This works correctly for our example, as the order of the two data frames is
identical.
geocoded_addresses_with_id = geocoded_addresses.join(addresses)
geocoded_addresses_with_id
geometry | address | id | addr | |
---|---|---|---|---|
0 | POINT (24.91556 60.16320) | Ruoholahti, 14, Itämerenkatu, Salmisaari, Ruoh... | 1000 | Itämerenkatu 14, 00101 Helsinki, Finland |
1 | POINT (24.93166 60.16905) | Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp... | 1001 | Kampinkuja 1, 00100 Helsinki, Finland |
2 | POINT (24.94179 60.16989) | Kauppakeskus Citycenter, 8, Kaivokatu, Keskust... | 1002 | Kaivokatu 8, 00101 Helsinki, Finland |
3 | POINT (24.97846 60.19206) | Hermannin rantatie, Verkkosaari, Kalasatama, S... | 1003 | Hermannin rantatie 1, 00580 Helsinki, Finland |
4 | POINT (24.92151 60.15662) | 9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... | 1005 | Tyynenmerenkatu 9, 00220 Helsinki, Finland |
5 | POINT (25.08174 60.23522) | 18, Kontulantie, Kontula, Mellunkylä, Itäinen ... | 1006 | Kontulantie 18, 00940 Helsinki, Finland |
6 | POINT (25.10974 60.22102) | Itäväylä, Vartioharju, Vartiokylä, Itäinen suu... | 1007 | Itäväylä 3, 00950 Helsinki, Finland |
7 | POINT (25.02831 60.27844) | Tapulikaupungintie, Tapulikaupunki, Suutarila,... | 1008 | Tapulikaupungintie 3, 00750 Helsinki, Finland |
8 | POINT (25.02883 60.26326) | Sompionpolku, Fallkullan kiila, Tapanila, Tapa... | 1009 | Sompionpolku 2, 00730 Helsinki, Finland |
9 | POINT (24.87197 60.22244) | 5, Atomitie, Strömberg, Pitäjänmäen teollisuus... | 1010 | Atomitie 5, 00370 Helsinki, Finland |
10 | POINT (24.94269 60.17118) | Rautatientori, Keskusta, Kluuvi, Eteläinen suu... | 1011 | Rautatientori 1, 00100 Helsinki, Finland |
11 | POINT (24.88421 60.23050) | Kuparitie, Lassila, Haaga, Läntinen suurpiiri,... | 1012 | Kuparitie 8, 00440 Helsinki, Finland |
12 | POINT (24.87527 60.23890) | Rumpupolku, Kannelmäki, Kaarela, Läntinen suur... | 1013 | Rumpupolku 8, 00420 Helsinki, Finland |
13 | POINT (24.94854 60.22196) | Otto. automaatti (ATM), 1, Mäkitorpantie, Pato... | 1014 | Mäkitorpantie 1, 00620 Helsinki, Finland |
14 | POINT (25.01295 60.25107) | Yliopiston Apteekki, 15, Malminkaari, Ala-Malm... | 1015 | Malminkaari 15, 00700 Helsinki, Finland |
15 | POINT (24.89418 60.21722) | 23, Kylätie, Etelä-Haaga, Haaga, Läntinen suur... | 1016 | Kylätie 23, 00320 Helsinki, Finland |
16 | POINT (24.86653 60.25131) | Malminkartanontie, Malminkartano, Kaarela, Län... | 1017 | Malminkartanontie 17, 00410 Helsinki, Finland |
17 | POINT (24.96566 60.22982) | Oulunkylän tori, Patola, Oulunkylä, Pohjoinen ... | 1018 | Oulunkylän tori 2b, 00640 Helsinki, Finland |
18 | POINT (24.93435 60.19857) | 6, Ratapihantie, Itä-Pasila, Pasila, Keskinen ... | 1019 | Ratapihantie 6, 00101 Helsinki, Finland |
19 | POINT (24.86086 60.22407) | 15, Pitäjänmäentie, Reimarla, Pitäjänmäki, Län... | 1020 | Pitäjänmäentie 15, 00370 Helsinki, Finland |
20 | POINT (24.99362 60.24365) | K-Market, 2, Eskolantie, Savela, Pukinmäki, Ko... | 1021 | Eskolantie 2, 00720 Helsinki, Finland |
21 | POINT (25.02891 60.24244) | Tattariharjuntie, Ala-Malmi, Malmi, Koillinen ... | 1022 | Tattariharjuntie, 00700 Helsinki, Finland |
22 | POINT (25.07842 60.20984) | Otto. pankkiautomaatti, 1, Tallinnanaukio, Itä... | 1023 | Tallinnanaukio 1, 00930 Helsinki, Finland |
23 | POINT (25.13686 60.20703) | Tyynylaavantie, Keski-Vuosaari, Vuosaari, Itäi... | 1024 | Tyynylaavantie 7, 00980 Helsinki, Finland |
24 | POINT (25.07918 60.22320) | Myllypurontie, Myllypuro, Vartiokylä, Itäinen ... | 1025 | Myllypurontie 5, 00920 Helsinki, Finland |
25 | POINT (25.10964 60.23788) | Mellunmäenraitio, Mellunmäki, Mellunkylä, Itäi... | 1026 | Mellunmäenraitio 6, 00970 Helsinki, Finland |
26 | POINT (24.96108 60.18801) | Vaasanpolku, Kurvi, Harju, Alppiharju, Keskine... | 1027 | Vaasanpolku 2, 00101 Helsinki, Finland |
27 | POINT (25.02832 60.19442) | Alko, 2, Hiihtäjäntie, Länsi-Herttoniemi, Hert... | 1028 | Hiihtäjäntie 2, 00810 Helsinki, Finland |
28 | POINT (25.00681 60.18872) | Metro Kulosaari, 2, Ukko-Pekan porras, Kulosaa... | 1029 | Ukko-Pekan porras 2, 00570 Helsinki, Finland |
29 | POINT (24.94953 60.17954) | Instrumentarium Hakaniemi, 16, Siltasaarenkatu... | 1030 | Siltasaarenkatu 16, 00530 Helsinki, Finland |
30 | POINT (24.93312 60.16909) | Kampin keskus, 1, Urho Kekkosen katu, Kamppi, ... | 1031 | Urho Kekkosen katu 1, 00100 Helsinki, Finland |
31 | POINT (24.93039 60.16641) | Ruoholahdenkatu, Kamppi, Eteläinen suurpiiri, ... | 1032 | Ruoholahdenkatu 17, 00101 Helsinki, Finland |
32 | POINT (24.92121 60.15878) | 3, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E... | 1033 | Tyynenmerenkatu 3, 00220 Helsinki, Finland |
33 | POINT (24.94694 60.17198) | 4, Vilhonkatu, Kaisaniemi, Kluuvi, Eteläinen s... | 1034 | Vilhonkatu 4, 00101 Helsinki, Finland |
The output of join()
is a new geopandas.GeoDataFrame
:
type(geocoded_addresses_with_id)
geopandas.geodataframe.GeoDataFrame
The new data frame has all original columns plus new columns for the geometry
and for a parsed address
that can be used to spot-check the results.
Note
If you would do the join the other way around, i.e. addresses.join(geocoded_addresses)
, the output would be a pandas.DataFrame
, not a geopandas.GeoDataFrame
.
It’s now easy to save the new data set as a geospatial file, for instance, in GeoPackage format:
geocoded_addresses.to_file(DATA_DIRECTORY / "addresses.gpkg")