Retrieving OpenStreetMap data¶
What is OpenStreetMap?¶
OpenStreetMap (OSM) is a global collaborative (crowd-sourced) dataset and project that aims at creating a free editable map of the world containing a lot of information about our environment. It contains data for example about streets, buildings, different services, and landuse to mention a few. You can view the map at www.openstreetmap.org. You can also sign up as a contributor if you want to edit the map.
OSM has a large userbase with more than 4 million users and over a million contributers that update actively the OSM database with 3 million changesets per day. In total OSM contains 5 billion nodes that form the basis of the digitally mapped world that OSM provides (stats from November 2019).
OpenStreetMap is used not only for integrating the OSM maps as background maps to visualizations or online maps, but also for many other purposes such as routing, geocoding, education, and research. OSM is also widely used for humanitarian response e.g. in crisis areas (e.g. after natural disasters) and for fostering economic development (see more from Humanitarian OpenStreetMap Team (HOTOSM) website).
OSMnx¶
This week we will explore a Python module called OSMnx that can be used to retrieve, construct, analyze, and visualize street networks from OpenStreetMap, and also retrieve data about Points of Interest such as restaurants, schools, and lots of different kind of services. It is also easy to conduct network routing based on walking, cycling or driving by combining OSMnx functionalities with a package called NetworkX.
To get an overview of the capabilities of the package, see an introductory video given by the lead developer of the package, Prof. Geoff Boeing: “Meet the developer: Introduction to OSMnx package by Geoff Boeing”.
There is also a scientific article available describing the package:
Boeing, G. 2017. “OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks.” Computers, Environment and Urban Systems 65, 126-139. doi:10.1016/j.compenvurbsys.2017.05.004
Download and visualize OpenStreetMap data with OSMnx¶
One the most useful features that OSMnx provides is an easy-to-use way of retrieving OpenStreetMap data (using OverPass API).
In this tutorial, we will learn how to download and visualize OSM data covering a specified area of interest: a district of Kamppi in Helsinki, Finland.
Street network¶
OSMnx makes it really easy to do that as it allows you to specify an address to retrieve the OpenStreetMap data around that area. In fact, OSMnx uses the same Nominatim Geocoding API to do this, which we tested during the Lesson 2.
Let’s retrieve OpenStreetMap (OSM) data by specifying
"Kamppi, Helsinki, Finland"
as the place from where the data should be downloaded.
import osmnx as ox
import matplotlib.pyplot as plt
%matplotlib inline
# Specify the name that is used to seach for the data
place_name = "Kamppi, Helsinki, Finland"
# Fetch OSM street network from the location
graph = ox.graph_from_place(place_name)
Check the data type of the graph:
type(graph)
networkx.classes.multidigraph.MultiDiGraph
Okey, as we can see the data that we retrieved is a special data object called networkx.classes.multidigraph.MultiDiGraph
. A DiGraph is a data type that stores nodes and edges with optional data, or attributes. What we can see here is that this data type belongs to a Python module called networkx that can be used to create, manipulate, and study the structure, dynamics, and functions of complex networks. Networkx module contains algorithms that can be used to calculate shortest paths
along road networks using e.g. Dijkstra’s or A* algorithm.
Let’s see how our street network looks like. It is easy to visualize the graph with OSMnx with
plot_graph()
function. The function utilizes Matplotlib for visualizing the data, hence as a result it returns a matplotlib figure and axis objects:
# Plot the streets
fig, ax = ox.plot_graph(graph)
Great! Now we can see that our graph contains the nodes (blue circles) and the edges (gray lines) that connects those nodes to each other.
Place polygon¶
Let’s also plot the Polygon that represents our area of interest (Kamppi, Helsinki). We can retrieve the Polygon geometry using the gdf_from_place() -function.
Retrieve the extent of our location:
area = ox.gdf_from_place(place_name)
As the name of the function already tells us, gdf_from_place()
returns a GeoDataFrame based on the specified place name query.
Check the data type:
type(area)
geopandas.geodataframe.GeoDataFrame
Check the data:
area
geometry | place_name | bbox_north | bbox_south | bbox_east | bbox_west | |
---|---|---|---|---|---|---|
0 | POLYGON ((24.92074 60.16690, 24.92075 60.16687... | Kamppi, Southern major district, Helsinki, Hel... | 60.172109 | 60.160474 | 24.943453 | 24.920742 |
Plot the area:
area.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x15aa208cc50>
Building footprints¶
It is also possible to retrieve other types of OSM data features with OSMnx such as buildings or points of interest (POIs). Let’s download the buildings with OSMnx
footprints_from_place() -function (same as buildings_from_place
method in OSMnx<0.9) and plot them on top of our street network in Kamppi.
Retrieve buildings from the area:
buildings = ox.footprints_from_place(place_name)
Note, you can also get other types of footprints using the parameter footprint_type
(default is “buildings”).
Check how many building footprints we received:
len(buildings)
427
Buildings GeoDataFrame contains several polygons.
Check the first rows:
buildings.head(3)
nodes | geometry | addr:city | addr:country | addr:housenumber | addr:street | building | name | name:fi | name:ko | ... | outdoor_seating | addr:floor | access | covered | type | brand | building:part | ele | electrified | addr:unit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8035238 | [60069605, 60069615, 60275530, 1036979252, 105... | POLYGON ((24.93563 60.17045, 24.93557 60.17054... | Helsinki | FI | 22-24 | Mannerheimintie | public | Lasipalatsi | Lasipalatsi | 라시팔라치 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8042297 | [1378950415, 1378950417, 1378950418, 319515866... | POLYGON ((24.92938 60.16795, 24.92933 60.16797... | Helsinki | FI | 2 | Runeberginkatu | yes | Radisson Blu Royal | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
14797170 | [146125363, 3203698292, 3203698293, 3203698294... | POLYGON ((24.92427 60.16648, 24.92427 60.16650... | Helsinki | FI | 10 | Lapinlahdenkatu | school | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 rows × 93 columns
As you can see, there are several columns in the buildings-layer. Each column contains information about a spesific tag that OpenStreetMap contributors have added. Each tag consists of a key (the column name), and several potential values (for example building=yes
or building=school
). Read more about tags and tagging practices in the OpenStreetMap wiki.
buildings.columns
Index(['nodes', 'geometry', 'addr:city', 'addr:country', 'addr:housenumber',
'addr:street', 'building', 'name', 'name:fi', 'name:ko', 'name:sv',
'start_date', 'url', 'wikidata', 'wikipedia', 'addr:postcode', 'bar',
'email', 'fax', 'internet_access', 'internet_access:fee', 'phone',
'smoking', 'tourism', 'website', 'operator', 'source', 'architect',
'building:levels', 'landuse', 'suojelumerkintä', 'layer', 'ref',
'fixme', 'last_full_renovation', 'roof:levels',
'building:maintenance:operator', 'name:local', 'levels', 'old_name',
'created_by', 'omistusasuntoja', 'building:material', 'roof:shape',
'building:colour', 'roof:colour', 'name:en', 'name:fr', 'alt_name',
'note', 'short_name', 'was:building', 'was:guard:operator',
'wheelchair', 'building:min_level', 'amenity', 'cuisine', 'historic',
'inscription', 'tomb', 'addr:housename', 'opening_hours', 'shop',
'toilets:wheelchair', 'name:da', 'name:nn', 'wheelchair:description',
'denomination', 'religion', 'height', 'name:ru', 'official_name',
'last_pipe_renovation', 'contact:website', 'guard:operator', 'loc_name',
'name:no', 'alt_name:en', 'name:zh', 'drive_through', 'ice_cream',
'lippakioski', 'takeaway', 'outdoor_seating', 'addr:floor', 'access',
'covered', 'type', 'brand', 'building:part', 'ele', 'electrified',
'addr:unit'],
dtype='object')
Points-of-interest¶
OSMnx has a nice function called ox.pois_from_place() that can be used retrieve specific points-of-interest (POIs) from OpenStreetMap based on their amenity-tag. We can, for excample, retrieve all points with a tag amenity=restaurant
, by passing an argument to the amenities
paremeter. We could also retrieve several POI categories by passing a list of OSM amenity tag values to the function.
Let’s retrieve restaurants that are located in our area of interest:
# Retrieve restaurants
restaurants = ox.pois_from_place(place_name, amenities=['restaurant'])
# How many restaurants do we have?
len(restaurants)
212
As we can see, there are quite many restaurants in the area.
Let’s explore what kind of attributes we have in our restaurants GeoDataFrame:
# Available columns
restaurants.columns
Index(['osmid', 'geometry', 'addr:city', 'addr:country', 'addr:housenumber',
'addr:postcode', 'addr:street', 'amenity', 'cuisine', 'name', 'phone',
'website', 'wheelchair', 'element_type', 'toilets:wheelchair',
'created_by', 'outdoor_seating', 'fixme', 'opening_hours', 'email',
'internet_access', 'internet_access:fee', 'opening_hours:brunch',
'diet:vegetarian', 'name:fi', 'name:zh', 'short_name', 'takeaway',
'contact:website', 'diet:vegan', 'name:ru', 'operator', 'smoking',
'wheelchair:description', 'level', 'contact:phone', 'source', 'name:en',
'building', 'addr:housename', 'note', 'address', 'brunch',
'contact:foursquare', 'contact:yelp', 'ref:vatin', 'delivery', 'url',
'lunch:menu', 'reservation', 'room', 'toilets', 'capacity',
'access:dog', 'shop', 'opening_hours:lunch_buffet', 'is_in', 'wikidata',
'alt_name', 'contact:email', 'established', 'description', 'name:sv',
'lunch', 'description:en', 'old_name', 'highchair', 'was:name',
'website:en', 'lunch:buffet', 'office', 'addr:place', 'entrance',
'addr:floor', 'layer', 'image', 'payment:mastercard', 'payment:visa',
'nodes'],
dtype='object')
Wow, there is quite a lot of information related to the POIs. One of the useful ones might be for example the name
, address information
and opening_hours
information:
# Select some useful cols and print
cols = ['name', 'opening_hours', 'addr:city', 'addr:country',
'addr:housenumber', 'addr:postcode', 'addr:street']
# Print only selected cols
restaurants[cols].head(10)
name | opening_hours | addr:city | addr:country | addr:housenumber | addr:postcode | addr:street | |
---|---|---|---|---|---|---|---|
60062502 | Kabuki | NaN | Helsinki | FI | 12 | 00180 | Lapinlahdenkatu |
60133792 | Ateljé Finne | NaN | Helsinki | FI | NaN | NaN | NaN |
62965963 | Empire Plaza | NaN | NaN | NaN | NaN | NaN | NaN |
62967659 | Ravintola Pääposti | NaN | Helsinki | NaN | 1 B | 00100 | Mannerheiminaukio |
68734026 | Hampton Bay | NaN | Helsinki | FI | 6 | 00120 | Hietalahdenranta |
76617692 | Johan Ludvig | NaN | Helsinki | FI | NaN | NaN | NaN |
76624339 | Ravintola Rivoletto | Mo-Th 11:00-23:00; Fr 11:00-24:00; Sa 15:00-24... | Helsinki | FI | 38 | 00120 | Albertinkatu |
76624351 | Pueblo | NaN | Helsinki | FI | NaN | NaN | Eerikinkatu |
76627823 | Atabar | NaN | Helsinki | FI | NaN | NaN | Eerikinkatu |
77642757 | Southpark | Mo-Sa 11:00-15:00; Su 10:30-17:00 | Helsinki | NaN | 40 | 00120 | Sinebrychoffin puisto, Bulevardi |
As we can see, there exists a lot of useful information about restaurants that can be retrieved easily with OSMnx. Also, if some of the information need updating, you can go over to www.openstreetmap.org and edit the source data! :)
Graph to GeoDataFrame¶
We can now plot all these different OSM layers by using the familiar plot()
function of Geopandas. As you might remember, the street network data is not a GeoDataFrame (it is networkx.MultiDiGraph
). Luckily, OSMnx provides a convenient function graph_to_gdfs()
that can convert the graph into two separate GeoDataFrames where the first one contains the information about the nodes and the second one about the edge.
Let’s extract the nodes and edges from the graph as GeoDataFrames:
# Retrieve nodes and edges
nodes, edges = ox.graph_to_gdfs(graph)
nodes.head()
y | x | osmid | highway | ref | geometry | |
---|---|---|---|---|---|---|
3216400385 | 60.167552 | 24.934005 | 3216400385 | turning_circle | NaN | POINT (24.93400 60.16755) |
1372233731 | 60.162290 | 24.929274 | 1372233731 | crossing | NaN | POINT (24.92927 60.16229) |
319885318 | 60.165072 | 24.925487 | 319885318 | NaN | NaN | POINT (24.92549 60.16507) |
1005744134 | 60.161622 | 24.924423 | 1005744134 | NaN | NaN | POINT (24.92442 60.16162) |
3216400394 | 60.167662 | 24.933920 | 3216400394 | NaN | NaN | POINT (24.93392 60.16766) |
edges.head()
u | v | key | osmid | name | highway | maxspeed | oneway | length | geometry | lanes | service | tunnel | junction | access | bridge | ref | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3216400385 | 301360890 | 0 | 15240373 | Kansakoulukuja | residential | 30 | False | 13.177 | LINESTRING (24.93400 60.16755, 24.93393 60.167... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 1372233731 | 298367080 | 0 | 86533507 | NaN | footway | NaN | False | 6.925 | LINESTRING (24.92927 60.16229, 24.92917 60.16225) | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 1372233731 | 292859610 | 0 | 15103120 | NaN | primary_link | 30 | True | 33.874 | LINESTRING (24.92927 60.16229, 24.92930 60.162... | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 1372233731 | 4430643601 | 0 | [154412960, 86533507] | NaN | footway | NaN | False | 12.489 | LINESTRING (24.92927 60.16229, 24.92941 60.162... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 1372233731 | 311043714 | 0 | 86533509 | Hietalahdenkatu | primary | 30 | True | 38.768 | LINESTRING (24.92927 60.16229, 24.92938 60.162... | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
Nice! Now, as we can see, we have our graph as GeoDataFrames and we can plot them using the same functions and tools as we have used before.
Note
There are also other ways of retrieving the data from OpenStreetMap with OSMnx such as passing a Polygon to extract the data from that area, or passing Point coordinates and retrieving data around that location with specific radius. Take a look of this tutorial to find out how to use those features of OSMnx.
Plotting the data¶
Let’s create a map out of the streets, buildings, restaurants, and the area Polygon but let’s exclude the nodes (to keep the figure clearer).
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12,8))
# Plot the footprint
area.plot(ax=ax, facecolor='black')
# Plot street edges
edges.plot(ax=ax, linewidth=1, edgecolor='#BC8F8F')
# Plot buildings
buildings.plot(ax=ax, facecolor='khaki', alpha=0.7)
# Plot restaurants
restaurants.plot(ax=ax, color='green', alpha=0.7, markersize=10)
plt.tight_layout()
Cool! Now we have a map where we have plotted the restaurants, buildings, streets and the boundaries of the selected region of ‘Kamppi’ in Helsinki. And all of this required only a few lines of code. Pretty neat!
As a final step, we might want to re-project the layers to a local projection for plotting. Here, we will use the tools we already know, namely pyproj CRS
. In the latter part of this lesson we will learn how to use OSMnx to re-project our data to UTM coordinates.
Re-project the layers to epsg:3067
from pyproj import CRS
# Set projection
projection = CRS.from_epsg(3067)
# Re-project layers
area = area.to_crs(projection)
edges = edges.to_crs(projection)
buildings = buildings.to_crs(projection)
restaurants = restaurants.to_crs(projection)
Create a new plot with the re-projected layers:
fig, ax = plt.subplots(figsize=(12,8))
# Plot the footprint
area.plot(ax=ax, facecolor='black')
# Plot street edges
edges.plot(ax=ax, linewidth=1, edgecolor='dimgray')
# Plot buildings
buildings.plot(ax=ax, facecolor='silver', alpha=0.7)
# Plot restaurants
restaurants.plot(ax=ax, color='yellow', alpha=0.7, markersize=10)
plt.tight_layout()
Task
Retrieve OpenStreetMap data from some other area! Download these elements using OSMnx functions from your area of interest:
Extent of the area using
gdf_from_place()
Street network using
graph_from_place()
, and convert to gdf usingox.graph_to_gdfs()
Building footprints using
ox.footprints_from_place()
Note, the larger the area you choose, the longer it takes to retrieve data from the API! Use parameter network_type=drive
to limit the graph query to filter out un-driveable roads.
Extra: Park polygons¶
Notice that we can also retrieve other types of footprints from OpenStreetMap by specifying the footprint_type
when using functions from the OSMnx footprints module. buildings
is the default value for this parameter, but we can also pass other OpenStreetMap tag keys.
Let’s try to fetch all public parks in the Kamppi area. In OpenStreetMap, parks are often tagged as leisure=park
(also other tags might be used, such as landuse=recreation_ground
,landuse=grass
, see OpenStreetMap, and OSM wiki for more details).
We need to start by fetching all footprints from the tag
leisure
:
leisure = ox.footprints_from_place(place_name, footprint_type="leisure")
let’s check the data:
leisure.head(3)
nodes | geometry | leisure | name | name:fi | name:sv | hoitoluokitus_viheralue | source | wikidata | wikipedia | access | alt_name | loc_name | barrier | sport | colour | fixme | mooring | short_name | short_name:sv | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8042256 | [292719496, 1001543836, 1037987967, 1001544060... | POLYGON ((24.93566 60.17132, 24.93566 60.17130... | park | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8042613 | [552965718, 293390264, 295056669, 256264975, 1... | POLYGON ((24.93701 60.16947, 24.93627 60.16919... | park | Simonpuistikko | Simonpuistikko | Simonsskvären | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
15218362 | [144181223, 150532964, 150532958, 150532966, 1... | POLYGON ((24.92330 60.16499, 24.92323 60.16500... | park | Työmiehenpuistikko | Työmiehenpuistikko | Arbetarparken | A2 | survey | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Check all values for the column
leisure
:
leisure["leisure"].value_counts()
park 15
pitch 8
playground 6
dog_park 2
flowerbed 1
marina 1
Name: leisure, dtype: int64
select all park polygons (here, selecting both “park” and “playground”):
parks = leisure[leisure["leisure"].isin(["park","playground"])]
plot the parks:
parks.plot(color="green")
<matplotlib.axes._subplots.AxesSubplot at 0x15aa2ef95c0>
Finally, we can re-project the park polygons and add them to our map:
parks = parks.to_crs(projection)
fig, ax = plt.subplots(figsize=(12,8))
# Plot the footprint
area.plot(ax=ax, facecolor='black')
# Plot the parks
parks.plot(ax=ax, facecolor="green")
# Plot street edges
edges.plot(ax=ax, linewidth=1, edgecolor='dimgray')
# Plot buildings
buildings.plot(ax=ax, facecolor='silver', alpha=0.7)
# Plot restaurants
restaurants.plot(ax=ax, color='yellow', alpha=0.7, markersize=10)
plt.tight_layout()