Case: hospital districts

In this tutorial, we will create boundaries of Finnish hospital districts (sairaanhoitopiiri in Finnish) by dissolving municipality boundaries into larger entities. Main processing steps include a table join and dissolving the municipality geometries into larger entities.

We will combine information from municipality polygons from Statistics Finland and a list of health care districts by the Finnish Municipality authority Kuntaliitto.

Importing required python packages:

import json
import numpy as np
import pandas as pd
import geopandas as gpd
from pyproj import CRS
import matplotlib.pyplot as plt

Read in data

  • Municipality polygons from Statistics Finland web feature service: https://www.stat.fi/org/avoindata/paikkatietoaineistot/kuntapohjaiset_tilastointialueet.html

    • wfs: http://geo.stat.fi/geoserver/tilastointialueet/wfs?

    • feature: tilastointialueet:kunta1000k (most recent information about municipality polygons)

# For available features, see http://geo.stat.fi/geoserver/tilastointialueet/wfs?request=GetCapabilities
url = "http://geo.stat.fi/geoserver/tilastointialueet/wfs?request=GetFeature&typename=tilastointialueet:kunta1000k&outputformat=JSON"
geodata = gpd.read_file(url)
geodata.head()
id kunta vuosi nimi namn name geometry
0 kunta1000k.1 005 2020 Alajärvi Alajärvi Alajärvi POLYGON ((366787.924 7001300.583, 364487.590 6...
1 kunta1000k.2 009 2020 Alavieska Alavieska Alavieska POLYGON ((382543.364 7120022.976, 382899.505 7...
2 kunta1000k.3 010 2020 Alavus Alavus Alavus POLYGON ((343298.204 6961570.195, 343831.847 6...
3 kunta1000k.4 016 2020 Asikkala Asikkala Asikkala POLYGON ((436139.680 6798279.085, 435714.468 6...
4 kunta1000k.5 018 2020 Askola Askola Askola POLYGON ((426631.036 6720528.076, 428821.749 6...
# Check length (there are 310 municipalities in Finland in 2020)
len(geodata)
310
#Select and rename columns
geodata.rename(columns={'kunta':'code'}, inplace=True)
geodata = geodata[['code','name', 'geometry']]
geodata.head()
code name geometry
0 005 Alajärvi POLYGON ((366787.924 7001300.583, 364487.590 6...
1 009 Alavieska POLYGON ((382543.364 7120022.976, 382899.505 7...
2 010 Alavus POLYGON ((343298.204 6961570.195, 343831.847 6...
3 016 Asikkala POLYGON ((436139.680 6798279.085, 435714.468 6...
4 018 Askola POLYGON ((426631.036 6720528.076, 428821.749 6...
geodata.plot()
<AxesSubplot:>
../../_images/create_health_district_polygons_9_1.png
geodata.dtypes
code          object
name          object
geometry    geometry
dtype: object
  • Finnish municipalities with hospital district information as an Excel spreadsheet

    • Downloaded from: https://www.kuntaliitto.fi/sosiaali-ja-terveysasiat/sairaanhoitopiirien-jasenkunnat in March 2020.

    • File Shp_jäsenkunnat_2020.xls, sheet kunnat_shp_2020_ aakkosjärj. This is the original unaltered file.

    • In this file, “shp” stands for “sairaanhoitopiiri” (hospital district in Finnish)

Note: this data set does not include Åland (Ahvenanmaa). Åland municipalities are added in the later step. Note: “hospital districts” is a more proper translation to sairaanhoitopiirit, but in this lesson I use “health care districts” to refer to these entities

Excel files often come with additional formatting such as metadata on the first lines of the data array. This is why it is a good idea to download the file on your own computer and have a look at the data structure before reading in the file using Python. It is also often a good idea to save the file as a csv file before reading in the data. However, it is also possible to read in data directly from Excel. For this, you need to have the xlrd module installed:

conda install -c conda-forge xlrd

Now we are ready to read in the data using pandas.

In the case of this health districts excel the header is located on the 4th row (index 3) of the excel spreadsheet.

# Read in the excel spreadsheet
data = pd.read_excel(r"data/Shp_jäsenkunnat_2020.xls", sheet_name="kunnat_shp_2020_ aakkosjärj.", header=3)
data.head()
kunta-\nkoodi kunta shp:n koodi sairaanhoitopiiri erva-alue kuntien lkm
0 NaN NaN NaN NaN NaN NaN
1 20.0 Akaa 6.0 Pirkanmaa TAYS 1.0
2 5.0 Alajärvi 15.0 Etelä-Pohjanmaa TAYS 2.0
3 9.0 Alavieska 18.0 Pohjois-Pohjanmaa OYS 3.0
4 10.0 Alavus 15.0 Etelä-Pohjanmaa TAYS 4.0

In addition, the first row after the header is empty. We can get rid of it using the dropna() -function:

data.dropna(inplace=True)

Check number of rows (16 Åland municipalities are missing)

len(data)
294

The data needs some fixing and cleaning after reading the excel sheet

# Rename columns from Finnish to English 
data.rename(columns={"kunta-\nkoodi":"code", 'sairaanhoitopiiri':'healthCareDistrict'}, inplace=True)

# Select only useful columns
data = data[['code','healthCareDistrict']]
data
code healthCareDistrict
1 20.0 Pirkanmaa
2 5.0 Etelä-Pohjanmaa
3 9.0 Pohjois-Pohjanmaa
4 10.0 Etelä-Pohjanmaa
5 16.0 Päijät-Häme
... ... ...
290 977.0 Pohjois-Pohjanmaa
291 980.0 Pirkanmaa
292 981.0 Kanta-Häme
293 989.0 Etelä-Pohjanmaa
294 992.0 Keski-Suomi

294 rows × 2 columns

Looks better! Now we need to prepare the data for table join. We will use the municipality code as the common key.

data.dtypes
code                  float64
healthCareDistrict     object
dtype: object

The code column is currently a floating point number. We need to modify these codes so that they match the ones in the spatial data:

# Example using one code
number = data.at[1, "code"]
number
20.0
# Conver this number to character string 020
"0" + str(int(number))
'020'

Let’s apply this process on all rows at once, and take into account different number of digits:

# Truncate and convert to character string
data["code"] = data["code"].astype(int).astype('str')

# Add missing zeros to municipality codes
data["code"] = data["code"].apply(lambda x: "00" + x if len(x)==1 else x)
data["code"] = data["code"].apply(lambda x: "0" + x if len(x)==2 else x)
data.head()
code healthCareDistrict
1 020 Pirkanmaa
2 005 Etelä-Pohjanmaa
3 009 Pohjois-Pohjanmaa
4 010 Etelä-Pohjanmaa
5 016 Päijät-Häme

Join Health district info to the municipality polygons

# Merge health district info to geodata using "code" as the common key
geodata = geodata.merge(data, on="code", how="left")
geodata
code name geometry healthCareDistrict
0 005 Alajärvi POLYGON ((366787.924 7001300.583, 364487.590 6... Etelä-Pohjanmaa
1 009 Alavieska POLYGON ((382543.364 7120022.976, 382899.505 7... Pohjois-Pohjanmaa
2 010 Alavus POLYGON ((343298.204 6961570.195, 343831.847 6... Etelä-Pohjanmaa
3 016 Asikkala POLYGON ((436139.680 6798279.085, 435714.468 6... Päijät-Häme
4 018 Askola POLYGON ((426631.036 6720528.076, 428821.749 6... HUS
... ... ... ... ...
305 977 Ylivieska POLYGON ((398010.991 7110887.267, 399696.069 7... Pohjois-Pohjanmaa
306 980 Ylöjärvi POLYGON ((313738.511 6896936.100, 319421.316 6... Pirkanmaa
307 981 Ypäjä POLYGON ((297451.456 6756204.328, 297931.884 6... Kanta-Häme
308 989 Ähtäri POLYGON ((348733.187 6959704.551, 349457.337 6... Etelä-Pohjanmaa
309 992 Äänekoski POLYGON ((452626.858 6973610.366, 457542.012 6... Keski-Suomi

310 rows × 4 columns

Looks good! However, Municipalities in the Åland island did not have a matching health care district in the data. Let’s have a closer look:

# List all municipalities that lack health district info:
geodata[geodata["healthCareDistrict"].isnull()].name
7          Brändö
8          Eckerö
15       Finström
17          Föglö
18           Geta
24     Hammarland
58         Jomala
112      Kumlinge
122         Kökar
135       Lemland
148    Lumparland
154     Mariehamn
237       Saltvik
255      Sottunga
257          Sund
302         Vårdö
Name: name, dtype: object
# Update "Ahvenanmaa" as the health care district for Åland municipalities (16 municipalities in total)
geodata.loc[geodata["healthCareDistrict"].isnull(), "healthCareDistrict"] = "Ahvenanmaa"

Check the count of municipalities per health care disctrict

geodata["healthCareDistrict"].value_counts()
Pohjois-Pohjanmaa    29
Varsinais-Suomi      28
HUS                  24
Pirkanmaa            23
Keski-Suomi          21
Etelä-Pohjanmaa      18
Pohjois-Savo         18
Satakunta            17
Ahvenanmaa           16
Lappi                15
Vaasa                13
Pohjois-Karjala      13
Päijät-Häme          12
Kanta-Häme           11
Keski-Pohjanmaa      10
Etelä-Karjala         9
Etelä-Savo            9
Kainuu                8
Länsi-Pohja           6
Kymenlaakso           6
Itä-Savo              4
Name: healthCareDistrict, dtype: int64

Create polygons for health care districts

# Dissolve (=combine) municipality polygon geometries for each health care district
districts = geodata.dissolve(by='healthCareDistrict')
districts.reset_index(inplace=True)
# Select useful columns
districts = districts[["healthCareDistrict", "geometry"]]
districts
healthCareDistrict geometry
0 Ahvenanmaa MULTIPOLYGON (((173277.623 6640282.925, 173136...
1 Etelä-Karjala POLYGON ((595843.841 6772915.996, 592557.900 6...
2 Etelä-Pohjanmaa POLYGON ((249539.259 6894974.367, 244232.829 6...
3 Etelä-Savo POLYGON ((596327.952 6823806.064, 596718.363 6...
4 HUS MULTIPOLYGON (((272609.681 6632304.439, 272418...
5 Itä-Savo POLYGON ((572240.425 6898837.522, 576521.513 6...
6 Kainuu POLYGON ((606127.874 7081796.115, 603849.594 7...
7 Kanta-Häme POLYGON ((393094.362 6756355.691, 394102.316 6...
8 Keski-Pohjanmaa MULTIPOLYGON (((302835.219 7083897.220, 302801...
9 Keski-Suomi POLYGON ((439110.805 6852598.036, 439300.113 6...
10 Kymenlaakso MULTIPOLYGON (((501532.450 6680088.052, 501352...
11 Lappi POLYGON ((518535.957 7313104.574, 515335.969 7...
12 Länsi-Pohja MULTIPOLYGON (((399348.710 7271646.236, 399096...
13 Pirkanmaa POLYGON ((324952.559 6773513.092, 327451.427 6...
14 Pohjois-Karjala POLYGON ((681990.187 6886181.668, 680094.200 6...
15 Pohjois-Pohjanmaa MULTIPOLYGON (((462785.501 7079521.042, 461527...
16 Pohjois-Savo POLYGON ((568071.215 6914370.703, 570686.991 6...
17 Päijät-Häme POLYGON ((441787.770 6730507.194, 442140.745 6...
18 Satakunta MULTIPOLYGON (((198569.146 6782909.960, 198622...
19 Vaasa MULTIPOLYGON (((200371.753 6892181.697, 200360...
20 Varsinais-Suomi MULTIPOLYGON (((258767.605 6632338.388, 258668...
districts.plot(column='healthCareDistrict', scheme="equal_interval",cmap='tab20', k=20)
plt.axis('off')
(40857.339215, 765862.479085, 6575077.820610001, 7833703.7515899995)
../../_images/create_health_district_polygons_42_1.png
# Write GeoJSON in original projection
districts.to_file("healthDistrictsEPSG3067.geojson", driver='GeoJSON', encoding='utf-8')
# Re-project to WGS84 and save again
wgs84 = CRS.from_epsg(4326)
districts.to_crs(wgs84).to_file("healthDistrictsEPSG4326.geojson", driver='GeoJSON', encoding='utf-8')

That’s it! You can elaborate this workflow by joining additional data. For example, if you join population info per municipality you can sum it up for each health care district using the aggfunc=sum argument to get population count per health care district.