Vector Data I/O#

One of the first steps of many analysis workflow is to read data from a file, one of the last steps often writes data to an output file. To the horror of many geoinformatics scholars, there exist many file formats for GIS data: the old and hated but also loved and established ESRI Shapefile, the universal Geopackage (GPKG), and the web-optimised GeoJSON are just a few of the more well-known examples.

Fear not, Python can read them all (no guarantees, though)!

Most of the current Python GIS packages rely on the GDAL/OGR libraries, for which modern interfaces exist in the form of the fiona and rasterio Python packages.

Today, we’ll concentrate on vector data, so let’s first take a closer look at fiona’s capabilities, and then import and export data using geopandas, which uses fiona under its hood.


Note: Defining a data directory constant
To make it easier to manage the paths of input and output data files, it is a good habit to define a constant pointing to the data directory at the top of a notebook.
[1]:
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"

File formats#

Fiona can read (almost) any geospatial file format, and write many of them. To find out which ones exactly (it might depend on the local installation and version, as well), we can print its list of file format drivers:

[2]:
import fiona
fiona.supported_drivers
[2]:
{'DXF': 'rw',
 'CSV': 'raw',
 'OpenFileGDB': 'raw',
 'ESRIJSON': 'r',
 'ESRI Shapefile': 'raw',
 'FlatGeobuf': 'raw',
 'GeoJSON': 'raw',
 'GeoJSONSeq': 'raw',
 'GPKG': 'raw',
 'GML': 'rw',
 'OGR_GMT': 'rw',
 'GPX': 'rw',
 'MapInfo File': 'raw',
 'DGN': 'raw',
 'S57': 'r',
 'SQLite': 'raw',
 'TopoJSON': 'r'}
Hint
In this list, r marks file formats fiona can read, and w formats it can write. An a marks formats for which fiona can append new data to existing files.

Note that each of the listed ‘formats’ is, in fact, the name of the driver implementation, and many of the drivers can open several related file formats.

Many more ‘exotic’ file formats might not show up in this list on your local installation, because you would need to install additional libraries. You can find a full list of file formats supported by GDAL/OGR (and fiona) on its webpage: gdal.org/drivers/vector/.

Reading and writing geospatial data#

Fiona allows very low-level access to geodata files. This is sometimes necessary, but in typical analysis workflows, it is more convenient to use a higher-level library. The most commonly used one for geospatial vector data is geopandas. As mentioned above, it uses fiona for reading and writing files, and thus supports the same file formats.

To read data from a GeoPackage file into a geopandas.GeoDataFrame (a geospatially-enabled version of a pandas.DataFrame), use geopandas.read_file():

[3]:
import geopandas
municipalities = geopandas.read_file(
    DATA_DIRECTORY / "finland_municipalities" / "finland_municipalities_2021.gpkg"
)
municipalities.head()
[3]:
GML_ID NATCODE NAMEFIN NAMESWE LANDAREA FRESHWAREA SEAWAREA TOTALAREA geometry
0 1601000258 498 Muonio Muonio 1904.05 133.73 0.0 2037.78 POLYGON ((366703.026 7563861.713, 373641.706 7...
1 1601000566 148 Inari Enare 15056.29 2277.33 0.0 17333.62 POLYGON ((554063.014 7746246.426, 558386.737 7...
2 1601000428 224 Karkkila Högfors 242.35 12.97 0.0 255.32 POLYGON ((338515.195 6726577.401, 338539.595 6...
3 1601000698 271 Kokemäki Kumo 480.20 51.06 0.0 531.26 POLYGON ((260519.503 6818726.479, 263236.792 6...
4 1601000343 176 Juuka Juuka 1501.70 344.87 0.0 1846.57 POLYGON ((607203.808 7035838.978, 608878.941 7...

Reading a local GPKG file is most likely the easiest task for a GIS package. However, in perfect Python ‘Swiss pocket knife’ manner, geopandas can also read Shapefiles inside a ZIP archive, and/or straight from an Internet URL. For example, downloading, unpacking and opening a data set of NUTS regions from the European Union’s GISCO/eurostat download page is one line of code:

nuts_regions = geopandas.read_file("https://gisco-services.ec.europa.eu/distribution/v2/nuts/shp/NUTS_RG_60M_2021_3035.shp.zip")
nuts_regions.head()
[4]:
nuts_regions = geopandas.read_file(DATA_DIRECTORY / "europe_nuts_regions.geojson")
nuts_regions.head()
[4]:
NUTS_ID LEVL_CODE CNTR_CODE NAME_LATN NUTS_NAME MOUNT_TYPE URBN_TYPE COAST_TYPE FID geometry
0 DE149 3 DE Sigmaringen Sigmaringen 4.0 3 3 DE149 POLYGON ((4272515.778 2791989.118, 4291502.208...
1 DE211 3 DE Ingolstadt, Kreisfreie Stadt Ingolstadt, Kreisfreie Stadt 4.0 2 3 DE211 POLYGON ((4430560.572 2849070.969, 4426522.606...
2 DE212 3 DE München, Kreisfreie Stadt München, Kreisfreie Stadt 4.0 1 3 DE212 POLYGON ((4426190.454 2780289.957, 4425325.775...
3 DE213 3 DE Rosenheim, Kreisfreie Stadt Rosenheim, Kreisfreie Stadt 4.0 2 3 DE213 POLYGON ((4470814.937 2743662.905, 4477767.129...
4 DE214 3 DE Altötting Altötting 4.0 2 3 DE214 POLYGON ((4539906.565 2792493.475, 4525936.167...

Writing geospatial data to a file#

Writing data to a file is equally straight-forward: simply use the `to_file() method <https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_file.html#geopandas.GeoDataFrame.to_file>`__ of a GeoDataFrame.

If we want to keep a local copy of the NUTS region data set we just opened on-the-fly from an internet address, the following saves the data to a GeoJSON file (the file format is guessed from the file name):

[5]:
nuts_regions.to_file(DATA_DIRECTORY / "europe_nuts_regions.geojson")
Note
Reading and writing geospatial data from or to a file is almost identical for all file formats supported by geopandas, fiona, and GDAL. Check out geopandas’ documentation for hints on how to fine-tune reading or writing a file, and how to apply different filters (e.g., bounding boxes).

Reading and writing from and to databases (RDBMS)#

Geopandas has native support for read/write access to PostgreSQL/PostGIS databases, using its `geopandas.read_postgis() <https://geopandas.org/en/stable/docs/reference/api/geopandas.read_postgis.html>`__ function and the `GeoDataFrame.to_postgis() <https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_postgis.html>`__ method. For the database connection, you can use, for instance, the sqlalchemy package.

import sqlalchemy
DB_CONNECTION_URL = "postgresql://myusername:mypassword@myhost:5432/mydatabase";
db_engine = sqlalchemy.create_engine(DB_CONNECTION_URL)

countries = geopandas.read_postgis(
    "SELECT name, geometry FROM countries",
    db_engine
)
countries.to_postgis(
    "new_table",
    db_engine
)

Reading data directly from a WFS (Web feature service) endpoint#

Geopandas can also read data directly from a WFS endpoint, such as, for instance the geodata APIs of Helsinki Region Infoshare. Constructing a valid WFS URI (address) is not part of this course (but check, for instance, the properties of a layer added to QGIS).

The following code loads a population grid of Helsinki from 2022. The parameters encoded into the WFS address specify the layer name, a bounding box, and the requested reference system.

population_grid = geopandas.read_file(
    "https://kartta.hsy.fi/geoserver/wfs"
    "?service=wfs"
    "&version=2.0.0"
    "&request=GetFeature"
    "&typeName=asuminen_ja_maankaytto:Vaestotietoruudukko_2022"
    "&srsName=EPSG:3879"
    "&bbox=25494767,6671328,25497720,6673701,EPSG:3879",
    crs="EPSG:3879"
)
population_grid.head()
[6]:
population_grid = geopandas.read_file(
    "https://avoidatastr.blob.core.windows.net/avoindata/AvoinData/"
    "6_Asuminen/Vaestotietoruudukko/Shp/Vaestotietoruudukko_2021_shp.zip"
)
population_grid.head()
[6]:
INDEX ASUKKAITA ASVALJYYS IKA0_9 IKA10_19 IKA20_29 IKA30_39 IKA40_49 IKA50_59 IKA60_69 IKA70_79 IKA_YLI80 geometry
0 688 5 50.60 99 99 99 99 99 99 99 99 99 POLYGON ((25472499.995 6689749.005, 25472499.9...
1 703 7 36.71 99 99 99 99 99 99 99 99 99 POLYGON ((25472499.995 6685998.998, 25472499.9...
2 710 8 44.50 99 99 99 99 99 99 99 99 99 POLYGON ((25472499.995 6684249.004, 25472499.9...
3 711 7 64.14 99 99 99 99 99 99 99 99 99 POLYGON ((25472499.995 6683999.005, 25472499.9...
4 715 11 41.09 99 99 99 99 99 99 99 99 99 POLYGON ((25472499.995 6682998.998, 25472499.9...