Case: hospital districts¶
In this tutorial, we will create boundaries of Finnish hospital districts (sairaanhoitopiiri in Finnish) by dissolving municipality boundaries into larger entities. Main processing steps include a table join and dissolving the municipality geometries into larger entities.
We will combine information from municipality polygons from Statistics Finland and a list of health care districts by the Finnish Municipality authority Kuntaliitto.
Importing required python packages:
import json
import numpy as np
import pandas as pd
import geopandas as gpd
from pyproj import CRS
import matplotlib.pyplot as plt
Read in data¶
Municipality polygons from Statistics Finland web feature service: https://www.stat.fi/org/avoindata/paikkatietoaineistot/kuntapohjaiset_tilastointialueet.html
wfs: http://geo.stat.fi/geoserver/tilastointialueet/wfs?
feature:
tilastointialueet:kunta1000k
(most recent information about municipality polygons)
# For available features, see http://geo.stat.fi/geoserver/tilastointialueet/wfs?request=GetCapabilities
url = "http://geo.stat.fi/geoserver/tilastointialueet/wfs?request=GetFeature&typename=tilastointialueet:kunta1000k&outputformat=JSON"
geodata = gpd.read_file(url)
geodata.head()
id | kunta | vuosi | nimi | namn | name | geometry | |
---|---|---|---|---|---|---|---|
0 | kunta1000k.1 | 005 | 2020 | Alajärvi | Alajärvi | Alajärvi | POLYGON ((366787.924 7001300.583, 364487.590 6... |
1 | kunta1000k.2 | 009 | 2020 | Alavieska | Alavieska | Alavieska | POLYGON ((382543.364 7120022.976, 382899.505 7... |
2 | kunta1000k.3 | 010 | 2020 | Alavus | Alavus | Alavus | POLYGON ((343298.204 6961570.195, 343831.847 6... |
3 | kunta1000k.4 | 016 | 2020 | Asikkala | Asikkala | Asikkala | POLYGON ((436139.680 6798279.085, 435714.468 6... |
4 | kunta1000k.5 | 018 | 2020 | Askola | Askola | Askola | POLYGON ((426631.036 6720528.076, 428821.749 6... |
# Check length (there are 310 municipalities in Finland in 2020)
len(geodata)
310
#Select and rename columns
geodata.rename(columns={'kunta':'code'}, inplace=True)
geodata = geodata[['code','name', 'geometry']]
geodata.head()
code | name | geometry | |
---|---|---|---|
0 | 005 | Alajärvi | POLYGON ((366787.924 7001300.583, 364487.590 6... |
1 | 009 | Alavieska | POLYGON ((382543.364 7120022.976, 382899.505 7... |
2 | 010 | Alavus | POLYGON ((343298.204 6961570.195, 343831.847 6... |
3 | 016 | Asikkala | POLYGON ((436139.680 6798279.085, 435714.468 6... |
4 | 018 | Askola | POLYGON ((426631.036 6720528.076, 428821.749 6... |
geodata.plot()
<AxesSubplot:>
geodata.dtypes
code object
name object
geometry geometry
dtype: object
Finnish municipalities with hospital district information as an Excel spreadsheet
Downloaded from: https://www.kuntaliitto.fi/sosiaali-ja-terveysasiat/sairaanhoitopiirien-jasenkunnat in March 2020.
File
Shp_jäsenkunnat_2020.xls
, sheetkunnat_shp_2020_ aakkosjärj.
This is the original unaltered file.In this file, “shp” stands for “sairaanhoitopiiri” (hospital district in Finnish)
Note: this data set does not include Åland (Ahvenanmaa). Åland municipalities are added in the later step. Note: “hospital districts” is a more proper translation to sairaanhoitopiirit, but in this lesson I use “health care districts” to refer to these entities
Excel files often come with additional formatting such as metadata on the first lines of the data array. This is why it is a good idea to download the file on your own computer and have a look at the data structure before reading in the file using Python. It is also often a good idea to save the file as a csv file before reading in the data. However, it is also possible to read in data directly from Excel. For this, you need to have the xlrd module installed:
conda install -c conda-forge xlrd
Now we are ready to read in the data using pandas.
In the case of this health districts excel the header is located on the 4th row (index 3) of the excel spreadsheet.
# Read in the excel spreadsheet
data = pd.read_excel(r"data/Shp_jäsenkunnat_2020.xls", sheet_name="kunnat_shp_2020_ aakkosjärj.", header=3)
data.head()
kunta-\nkoodi | kunta | shp:n koodi | sairaanhoitopiiri | erva-alue | kuntien lkm | |
---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 20.0 | Akaa | 6.0 | Pirkanmaa | TAYS | 1.0 |
2 | 5.0 | Alajärvi | 15.0 | Etelä-Pohjanmaa | TAYS | 2.0 |
3 | 9.0 | Alavieska | 18.0 | Pohjois-Pohjanmaa | OYS | 3.0 |
4 | 10.0 | Alavus | 15.0 | Etelä-Pohjanmaa | TAYS | 4.0 |
In addition, the first row after the header is empty. We can get rid of it using the dropna() -function:
data.dropna(inplace=True)
Check number of rows (16 Åland municipalities are missing)
len(data)
294
The data needs some fixing and cleaning after reading the excel sheet
# Rename columns from Finnish to English
data.rename(columns={"kunta-\nkoodi":"code", 'sairaanhoitopiiri':'healthCareDistrict'}, inplace=True)
# Select only useful columns
data = data[['code','healthCareDistrict']]
data
code | healthCareDistrict | |
---|---|---|
1 | 20.0 | Pirkanmaa |
2 | 5.0 | Etelä-Pohjanmaa |
3 | 9.0 | Pohjois-Pohjanmaa |
4 | 10.0 | Etelä-Pohjanmaa |
5 | 16.0 | Päijät-Häme |
... | ... | ... |
290 | 977.0 | Pohjois-Pohjanmaa |
291 | 980.0 | Pirkanmaa |
292 | 981.0 | Kanta-Häme |
293 | 989.0 | Etelä-Pohjanmaa |
294 | 992.0 | Keski-Suomi |
294 rows × 2 columns
Looks better! Now we need to prepare the data for table join. We will use the municipality code as the common key.
data.dtypes
code float64
healthCareDistrict object
dtype: object
The code column is currently a floating point number. We need to modify these codes so that they match the ones in the spatial data:
# Example using one code
number = data.at[1, "code"]
number
20.0
# Conver this number to character string 020
"0" + str(int(number))
'020'
Let’s apply this process on all rows at once, and take into account different number of digits:
# Truncate and convert to character string
data["code"] = data["code"].astype(int).astype('str')
# Add missing zeros to municipality codes
data["code"] = data["code"].apply(lambda x: "00" + x if len(x)==1 else x)
data["code"] = data["code"].apply(lambda x: "0" + x if len(x)==2 else x)
data.head()
code | healthCareDistrict | |
---|---|---|
1 | 020 | Pirkanmaa |
2 | 005 | Etelä-Pohjanmaa |
3 | 009 | Pohjois-Pohjanmaa |
4 | 010 | Etelä-Pohjanmaa |
5 | 016 | Päijät-Häme |
Join Health district info to the municipality polygons¶
# Merge health district info to geodata using "code" as the common key
geodata = geodata.merge(data, on="code", how="left")
geodata
code | name | geometry | healthCareDistrict | |
---|---|---|---|---|
0 | 005 | Alajärvi | POLYGON ((366787.924 7001300.583, 364487.590 6... | Etelä-Pohjanmaa |
1 | 009 | Alavieska | POLYGON ((382543.364 7120022.976, 382899.505 7... | Pohjois-Pohjanmaa |
2 | 010 | Alavus | POLYGON ((343298.204 6961570.195, 343831.847 6... | Etelä-Pohjanmaa |
3 | 016 | Asikkala | POLYGON ((436139.680 6798279.085, 435714.468 6... | Päijät-Häme |
4 | 018 | Askola | POLYGON ((426631.036 6720528.076, 428821.749 6... | HUS |
... | ... | ... | ... | ... |
305 | 977 | Ylivieska | POLYGON ((398010.991 7110887.267, 399696.069 7... | Pohjois-Pohjanmaa |
306 | 980 | Ylöjärvi | POLYGON ((313738.511 6896936.100, 319421.316 6... | Pirkanmaa |
307 | 981 | Ypäjä | POLYGON ((297451.456 6756204.328, 297931.884 6... | Kanta-Häme |
308 | 989 | Ähtäri | POLYGON ((348733.187 6959704.551, 349457.337 6... | Etelä-Pohjanmaa |
309 | 992 | Äänekoski | POLYGON ((452626.858 6973610.366, 457542.012 6... | Keski-Suomi |
310 rows × 4 columns
Looks good! However, Municipalities in the Åland island did not have a matching health care district in the data. Let’s have a closer look:
# List all municipalities that lack health district info:
geodata[geodata["healthCareDistrict"].isnull()].name
7 Brändö
8 Eckerö
15 Finström
17 Föglö
18 Geta
24 Hammarland
58 Jomala
112 Kumlinge
122 Kökar
135 Lemland
148 Lumparland
154 Mariehamn
237 Saltvik
255 Sottunga
257 Sund
302 Vårdö
Name: name, dtype: object
# Update "Ahvenanmaa" as the health care district for Åland municipalities (16 municipalities in total)
geodata.loc[geodata["healthCareDistrict"].isnull(), "healthCareDistrict"] = "Ahvenanmaa"
Check the count of municipalities per health care disctrict
geodata["healthCareDistrict"].value_counts()
Pohjois-Pohjanmaa 29
Varsinais-Suomi 28
HUS 24
Pirkanmaa 23
Keski-Suomi 21
Etelä-Pohjanmaa 18
Pohjois-Savo 18
Satakunta 17
Ahvenanmaa 16
Lappi 15
Vaasa 13
Pohjois-Karjala 13
Päijät-Häme 12
Kanta-Häme 11
Keski-Pohjanmaa 10
Etelä-Karjala 9
Etelä-Savo 9
Kainuu 8
Länsi-Pohja 6
Kymenlaakso 6
Itä-Savo 4
Name: healthCareDistrict, dtype: int64
Create polygons for health care districts¶
# Dissolve (=combine) municipality polygon geometries for each health care district
districts = geodata.dissolve(by='healthCareDistrict')
districts.reset_index(inplace=True)
# Select useful columns
districts = districts[["healthCareDistrict", "geometry"]]
districts
healthCareDistrict | geometry | |
---|---|---|
0 | Ahvenanmaa | MULTIPOLYGON (((173277.623 6640282.925, 173136... |
1 | Etelä-Karjala | POLYGON ((595843.841 6772915.996, 592557.900 6... |
2 | Etelä-Pohjanmaa | POLYGON ((249539.259 6894974.367, 244232.829 6... |
3 | Etelä-Savo | POLYGON ((596327.952 6823806.064, 596718.363 6... |
4 | HUS | MULTIPOLYGON (((272609.681 6632304.439, 272418... |
5 | Itä-Savo | POLYGON ((572240.425 6898837.522, 576521.513 6... |
6 | Kainuu | POLYGON ((606127.874 7081796.115, 603849.594 7... |
7 | Kanta-Häme | POLYGON ((393094.362 6756355.691, 394102.316 6... |
8 | Keski-Pohjanmaa | MULTIPOLYGON (((302835.219 7083897.220, 302801... |
9 | Keski-Suomi | POLYGON ((439110.805 6852598.036, 439300.113 6... |
10 | Kymenlaakso | MULTIPOLYGON (((501532.450 6680088.052, 501352... |
11 | Lappi | POLYGON ((518535.957 7313104.574, 515335.969 7... |
12 | Länsi-Pohja | MULTIPOLYGON (((399348.710 7271646.236, 399096... |
13 | Pirkanmaa | POLYGON ((324952.559 6773513.092, 327451.427 6... |
14 | Pohjois-Karjala | POLYGON ((681990.187 6886181.668, 680094.200 6... |
15 | Pohjois-Pohjanmaa | MULTIPOLYGON (((462785.501 7079521.042, 461527... |
16 | Pohjois-Savo | POLYGON ((568071.215 6914370.703, 570686.991 6... |
17 | Päijät-Häme | POLYGON ((441787.770 6730507.194, 442140.745 6... |
18 | Satakunta | MULTIPOLYGON (((198569.146 6782909.960, 198622... |
19 | Vaasa | MULTIPOLYGON (((200371.753 6892181.697, 200360... |
20 | Varsinais-Suomi | MULTIPOLYGON (((258767.605 6632338.388, 258668... |
districts.plot(column='healthCareDistrict', scheme="equal_interval",cmap='tab20', k=20)
plt.axis('off')
(40857.339215, 765862.479085, 6575077.820610001, 7833703.7515899995)
# Write GeoJSON in original projection
districts.to_file("healthDistrictsEPSG3067.geojson", driver='GeoJSON', encoding='utf-8')
# Re-project to WGS84 and save again
wgs84 = CRS.from_epsg(4326)
districts.to_crs(wgs84).to_file("healthDistrictsEPSG4326.geojson", driver='GeoJSON', encoding='utf-8')
That’s it! You can elaborate this workflow by joining additional data. For example, if you join population info per municipality you can sum it up for each health care district using the aggfunc=sum
argument to get population count per health care district.