Vector Data I/O#
One of the first steps of many analysis workflow is to read data from a file, one of the last steps often writes data to an output file. To the horror of many geoinformatics scholars, there exist many file formats for GIS data: the old and hated but also loved and established ESRI Shapefile, the universal Geopackage (GPKG), and the web-optimised GeoJSON are just a few of the more well-known examples.
Fear not, Python can read them all (no guarantees, though)!
Most of the current Python GIS packages rely on the GDAL/OGR libraries, for which modern interfaces exist in the form of the fiona and rasterio Python packages.
Today, we’ll concentrate on vector data, so let’s first take a closer look at fiona’s capabilities, and then import and export data using geopandas, which uses fiona under its hood.
Note: Defining a data directory constantTo make it easier to manage the paths of input and output data files, it is a good habit to define a constant pointing to the data directory at the top of a notebook.
[1]:
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"
File formats#
Fiona can read (almost) any geospatial file format, and write many of them. To find out which ones exactly (it might depend on the local installation and version, as well), we can print its list of file format drivers:
[2]:
import fiona
fiona.supported_drivers
[2]:
{'DXF': 'rw',
'CSV': 'raw',
'OpenFileGDB': 'raw',
'ESRIJSON': 'r',
'ESRI Shapefile': 'raw',
'FlatGeobuf': 'raw',
'GeoJSON': 'raw',
'GeoJSONSeq': 'raw',
'GPKG': 'raw',
'GML': 'rw',
'OGR_GMT': 'rw',
'GPX': 'rw',
'MapInfo File': 'raw',
'DGN': 'raw',
'S57': 'r',
'SQLite': 'raw',
'TopoJSON': 'r'}
HintIn this list,r
marks file formats fiona can read, andw
formats it can write. Ana
marks formats for which fiona can append new data to existing files.Note that each of the listed ‘formats’ is, in fact, the name of the driver implementation, and many of the drivers can open several related file formats.
Many more ‘exotic’ file formats might not show up in this list on your local installation, because you would need to install additional libraries. You can find a full list of file formats supported by GDAL/OGR (and fiona) on its webpage: gdal.org/drivers/vector/.
Reading and writing geospatial data#
Fiona allows very low-level access to geodata files. This is sometimes necessary, but in typical analysis workflows, it is more convenient to use a higher-level library. The most commonly used one for geospatial vector data is geopandas. As mentioned above, it uses fiona for reading and writing files, and thus supports the same file formats.
To read data from a GeoPackage file into a geopandas.GeoDataFrame
(a geospatially-enabled version of a pandas.DataFrame
), use geopandas.read_file()
:
[3]:
import geopandas
municipalities = geopandas.read_file(
DATA_DIRECTORY / "finland_municipalities" / "finland_municipalities_2021.gpkg"
)
municipalities.head()
[3]:
GML_ID | NATCODE | NAMEFIN | NAMESWE | LANDAREA | FRESHWAREA | SEAWAREA | TOTALAREA | geometry | |
---|---|---|---|---|---|---|---|---|---|
0 | 1601000258 | 498 | Muonio | Muonio | 1904.05 | 133.73 | 0.0 | 2037.78 | POLYGON ((366703.026 7563861.713, 373641.706 7... |
1 | 1601000566 | 148 | Inari | Enare | 15056.29 | 2277.33 | 0.0 | 17333.62 | POLYGON ((554063.014 7746246.426, 558386.737 7... |
2 | 1601000428 | 224 | Karkkila | Högfors | 242.35 | 12.97 | 0.0 | 255.32 | POLYGON ((338515.195 6726577.401, 338539.595 6... |
3 | 1601000698 | 271 | Kokemäki | Kumo | 480.20 | 51.06 | 0.0 | 531.26 | POLYGON ((260519.503 6818726.479, 263236.792 6... |
4 | 1601000343 | 176 | Juuka | Juuka | 1501.70 | 344.87 | 0.0 | 1846.57 | POLYGON ((607203.808 7035838.978, 608878.941 7... |
Reading a local GPKG file is most likely the easiest task for a GIS package. However, in perfect Python ‘Swiss pocket knife’ manner, geopandas can also read Shapefiles inside a ZIP archive, and/or straight from an Internet URL. For example, downloading, unpacking and opening a data set of NUTS regions from the European Union’s GISCO/eurostat download page is one line of code:
nuts_regions = geopandas.read_file("https://gisco-services.ec.europa.eu/distribution/v2/nuts/shp/NUTS_RG_60M_2021_3035.shp.zip")
nuts_regions.head()
[4]:
nuts_regions = geopandas.read_file(DATA_DIRECTORY / "europe_nuts_regions.geojson")
nuts_regions.head()
[4]:
NUTS_ID | LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | FID | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | DE149 | 3 | DE | Sigmaringen | Sigmaringen | 4.0 | 3 | 3 | DE149 | POLYGON ((4272515.778 2791989.118, 4291502.208... |
1 | DE211 | 3 | DE | Ingolstadt, Kreisfreie Stadt | Ingolstadt, Kreisfreie Stadt | 4.0 | 2 | 3 | DE211 | POLYGON ((4430560.572 2849070.969, 4426522.606... |
2 | DE212 | 3 | DE | München, Kreisfreie Stadt | München, Kreisfreie Stadt | 4.0 | 1 | 3 | DE212 | POLYGON ((4426190.454 2780289.957, 4425325.775... |
3 | DE213 | 3 | DE | Rosenheim, Kreisfreie Stadt | Rosenheim, Kreisfreie Stadt | 4.0 | 2 | 3 | DE213 | POLYGON ((4470814.937 2743662.905, 4477767.129... |
4 | DE214 | 3 | DE | Altötting | Altötting | 4.0 | 2 | 3 | DE214 | POLYGON ((4539906.565 2792493.475, 4525936.167... |
Writing geospatial data to a file#
Writing data to a file is equally straight-forward: simply use the `to_file()
method <https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_file.html#geopandas.GeoDataFrame.to_file>`__ of a GeoDataFrame
.
If we want to keep a local copy of the NUTS region data set we just opened on-the-fly from an internet address, the following saves the data to a GeoJSON file (the file format is guessed from the file name):
[5]:
nuts_regions.to_file(DATA_DIRECTORY / "europe_nuts_regions.geojson")
NoteReading and writing geospatial data from or to a file is almost identical for all file formats supported by geopandas, fiona, and GDAL. Check out geopandas’ documentation for hints on how to fine-tune reading or writing a file, and how to apply different filters (e.g., bounding boxes).
Reading and writing from and to databases (RDBMS)#
Geopandas has native support for read/write access to PostgreSQL/PostGIS databases, using its `geopandas.read_postgis()
<https://geopandas.org/en/stable/docs/reference/api/geopandas.read_postgis.html>`__ function and the `GeoDataFrame.to_postgis()
<https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_postgis.html>`__ method. For the database connection, you can use, for instance, the sqlalchemy
package.
import sqlalchemy
DB_CONNECTION_URL = "postgresql://myusername:mypassword@myhost:5432/mydatabase";
db_engine = sqlalchemy.create_engine(DB_CONNECTION_URL)
countries = geopandas.read_postgis(
"SELECT name, geometry FROM countries",
db_engine
)
countries.to_postgis(
"new_table",
db_engine
)
Reading data directly from a WFS (Web feature service) endpoint#
Geopandas can also read data directly from a WFS endpoint, such as, for instance the geodata APIs of Helsinki Region Infoshare. Constructing a valid WFS URI (address) is not part of this course (but check, for instance, the properties of a layer added to QGIS).
The following code loads a population grid of Helsinki from 2022. The parameters encoded into the WFS address specify the layer name, a bounding box, and the requested reference system.
population_grid = geopandas.read_file(
"https://kartta.hsy.fi/geoserver/wfs"
"?service=wfs"
"&version=2.0.0"
"&request=GetFeature"
"&typeName=asuminen_ja_maankaytto:Vaestotietoruudukko_2022"
"&srsName=EPSG:3879"
"&bbox=25494767,6671328,25497720,6673701,EPSG:3879",
crs="EPSG:3879"
)
population_grid.head()
[6]:
population_grid = geopandas.read_file(
"https://avoidatastr.blob.core.windows.net/avoindata/AvoinData/"
"6_Asuminen/Vaestotietoruudukko/Shp/Vaestotietoruudukko_2021_shp.zip"
)
population_grid.head()
[6]:
INDEX | ASUKKAITA | ASVALJYYS | IKA0_9 | IKA10_19 | IKA20_29 | IKA30_39 | IKA40_49 | IKA50_59 | IKA60_69 | IKA70_79 | IKA_YLI80 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 688 | 5 | 50.60 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | POLYGON ((25472499.995 6689749.005, 25472499.9... |
1 | 703 | 7 | 36.71 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | POLYGON ((25472499.995 6685998.998, 25472499.9... |
2 | 710 | 8 | 44.50 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | POLYGON ((25472499.995 6684249.004, 25472499.9... |
3 | 711 | 7 | 64.14 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | POLYGON ((25472499.995 6683999.005, 25472499.9... |
4 | 715 | 11 | 41.09 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | POLYGON ((25472499.995 6682998.998, 25472499.9... |