{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Vector Data I/O in Python\n", "\n", "Reading data into Python is usually the first step of an analysis workflow. There are various different GIS data formats available such as [Shapefile](https://en.wikipedia.org/wiki/Shapefile), [GeoJSON](https://en.wikipedia.org/wiki/GeoJSON), [KML](https://en.wikipedia.org/wiki/Keyhole_Markup_Language), and [GPKG](https://en.wikipedia.org/wiki/GeoPackage). [Geopandas](http://geopandas.org/io.html) is capable of reading data from all of these formats (plus many more). \n", "\n", "This tutorial will show some typical examples how to read (and write) data from different sources. The main point in this section is to demonstrate the basic syntax for reading and writing data using short code snippets. You can find the example data sets in the data-folder. However, most of the example databases do not exists, but you can use and modify the example syntax according to your own setup." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## File formats\n", "\n", "In geopandas, we use a generic function [from_file()](http://geopandas.org/reference.html#geopandas.GeoDataFrame.to_file) for reading in different data formats. In the bacground, Geopandas uses [fiona.open()](https://fiona.readthedocs.io/en/latest/fiona.html#fiona.open) when reading in data. Esri Shapefile is the default file format. For other file formats we need to specify which driver to use for reading in the data. \n", "\n", "You can check supported through geopandas, or directly from fiona: " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'AeronavFAA': 'r',\n", " 'ARCGEN': 'r',\n", " 'BNA': 'rw',\n", " 'DXF': 'rw',\n", " 'CSV': 'raw',\n", " 'OpenFileGDB': 'r',\n", " 'FlatGeobuf': 'r',\n", " 'ESRIJSON': 'r',\n", " 'ESRI Shapefile': 'raw',\n", " 'GeoJSON': 'raw',\n", " 'GeoJSONSeq': 'rw',\n", " 'GPKG': 'raw',\n", " 'GML': 'rw',\n", " 'OGR_GMT': 'rw',\n", " 'GPX': 'rw',\n", " 'GPSTrackMaker': 'rw',\n", " 'Idrisi': 'r',\n", " 'MapInfo File': 'raw',\n", " 'DGN': 'raw',\n", " 'PCIDSK': 'rw',\n", " 'OGR_PDS': 'r',\n", " 'S57': 'r',\n", " 'SEGY': 'r',\n", " 'SUA': 'r',\n", " 'TopoJSON': 'r'}" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import geopandas as gpd\n", "\n", "# Check supported format drivers\n", "gpd.io.file.fiona.drvsupport.supported_drivers\n", "\n", "# Same as:\n", "#import fiona\n", "#fiona.supported_drivers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read / write Shapefile" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import geopandas as gpd\n", "\n", "# Read file from Shapefile\n", "fp = \"data/finland_municipalities.shp\"\n", "data = gpd.read_file(fp)\n", "\n", "# Write to Shapefile (just make a copy)\n", "outfp = \"temp/finland_municipalities.shp\"\n", "data.to_file(outfp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read / write GeoJSON" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read file from GeoJSON\n", "fp = \"data/finland_municipalities.gjson\"\n", "data = gpd.read_file(fp, driver=\"GeoJSON\")\n", "\n", "# Write to GeoJSON (just make a copy)\n", "outfp = \"temp/finland_municipalities.gjson\"\n", "data.to_file(outfp, driver=\"GeoJSON\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read / write KML" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Enable KML driver\n", "gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'\n", "\n", "# Read file from KML\n", "fp = \"data/finland_municipalities.kml\"\n", "data = gpd.read_file(fp)\n", "\n", "# Write to KML (just make a copy)\n", "outfp = \"temp/finland_municipalities.kml\"\n", "data.to_file(outfp, driver=\"KML\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read / write Geopackage" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read file from Geopackage\n", "fp = \"data/finland_municipalities.gpkg\"\n", "data = gpd.read_file(fp)\n", "\n", "# Write to Geopackage (just make a copy)\n", "outfp = \"temp/finland_municipalities.gpkg\"\n", "data.to_file(outfp, driver=\"GPKG\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read / write GeoDatabase" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read file from File Geodatabase\n", "fp = \"data/finland.gdb\"\n", "data = gpd.read_file(fp, driver=\"OpenFileGDB\", layer='municipalities')\n", "\n", "# Write to same FileGDB (just add a new layer) - requires additional package installations(?)\n", "#outfp = \"data/finland.gdb\"\n", "#data.to_file(outfp, driver=\"FileGDB\", layer=\"municipalities_copy\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read / write MapInfo Tab" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read file from MapInfo Tab\n", "fp = \"data/finland_municipalities.tab\"\n", "data = gpd.read_file(fp, driver=\"MapInfo File\")\n", "\n", "# Write to same FileGDB (just add a new layer)\n", "outfp = \"temp/finland_municipalities.tab\"\n", "data.to_file(outfp, driver=\"MapInfo File\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Databases\n", "\n", "Example syntax for reading and writing data from/to databases. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read PostGIS database using psycopg2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import geopandas as gpd\n", "import psycopg2\n", "\n", "# Create connection to database with psycopg2 module (update params according your db)\n", "conn, cursor = psycopg2.connect(dbname='my_postgis_database', \n", " user='my_usrname', \n", " password='my_pwd', \n", " host='123.22.432.16', port=5432)\n", "\n", "# Specify sql query\n", "sql = \"SELECT * FROM MY_TABLE;\"\n", "\n", "# Read data from PostGIS\n", "data = gpd.read_postgis(sql=sql, con=conn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read / write PostGIS database using SqlAlchemy + GeoAlchemy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sqlalchemy.engine.url import URL\n", "from sqlalchemy import create_engine\n", "from sqlalchemy import MetaData\n", "from sqlalchemy.orm import sessionmaker\n", "from geoalchemy2 import WKTElement, Geometry\n", "\n", "# Update with your db parameters\n", "HOST = '123.234.345.16'\n", "DB = 'my_database'\n", "USER = 'my_user'\n", "PORT = 5432\n", "PWD = 'my_password'\n", "\n", "# Database info\n", "db_url = URL(drivername='postgresql+psycopg2', host=HOST, database=DB,\n", " username=USER, port=PORT, password=PWD)\n", "\n", "# Create engine\n", "engine = create_engine(db_url)\n", "\n", "# Init Metadata\n", "meta = MetaData()\n", "\n", "# Load table definitions from db\n", "meta.reflect(engine)\n", "\n", "# Create session\n", "Session = sessionmaker(bind=engine)\n", "session = Session()\n", "\n", "# ========================\n", "# Read data from PostGIS\n", "# ========================\n", "\n", "# Specify sql query\n", "sql = \"SELECT * FROM finland;\"\n", "\n", "# Pull the data\n", "data = gpd.read_postgis(sql=sql, con=engine)\n", "\n", "# Close session\n", "session.close()\n", "\n", "# =========================================\n", "# Write data to PostGIS (make a copy table)\n", "# =========================================\n", "\n", "# Coordinate Reference System (srid)\n", "crs = 4326\n", "\n", "# Target table\n", "target_table = 'finland_copy'\n", "\n", "# Convert Shapely geometries to WKTElements into column 'geom' (default in PostGIS)\n", "data['geom'] = data['geometry'].apply(lambda row: WKTElement(row.wkt, srid=crs))\n", "\n", "# Drop Shapely geometries\n", "data = data.drop('geometry', axis=1)\n", "\n", "# Write to PostGIS (overwrite if table exists, be careful with this! )\n", "# Possible behavior: 'replace', 'append', 'fail'\n", "\n", "data.to_sql(target_table, engine, if_exists='replace', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read / write Spatialite database " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import geopandas as gpd\n", "import sqlite3\n", "import shapely.wkb as swkb\n", "from sqlalchemy import create_engine, event\n", "\n", "# DB path\n", "dbfp = 'L2_data/Finland.sqlite'\n", "\n", "# Name for the table\n", "tbl_name = 'finland'\n", "\n", "# SRID (crs of your data)\n", "srid = 4326\n", "\n", "# Parse Geometry type of the input Data\n", "gtype = data.geom_type.unique()\n", "assert len(gtype) == 1, \"Mixed Geometries! Cannot insert into SQLite table.\"\n", "geom_type = gtype[0].upper()\n", "\n", "# Initialize database engine\n", "engine = create_engine('sqlite:///{db}'.format(db=dbfp), module=sqlite)\n", "\n", "# Initialize table without geometries\n", "geo = data.drop(['geometry'], axis=1)\n", "\n", "with sqlite3.connect(dbfp) as conn:\n", " geo.to_sql(tbl_name, conn, if_exists='replace', index=False)\n", "\n", "# Enable spatialite extension \n", "with sqlite3.connect(dbfp) as conn:\n", " conn.enable_load_extension(True)\n", " conn.load_extension(\"mod_spatialite\")\n", " conn.execute(\"SELECT InitSpatialMetaData(1);\")\n", " # Add geometry column with specified CRS with defined geometry typehaving two dimensions\n", " conn.execute(\n", " \"SELECT AddGeometryColumn({table}, 'wkb_geometry',\\\n", " {srid}, {geom_type}, 2);\".format(table=tbl_name, srid=srid, geom_type=geom_type)\n", " )\n", " \n", "# Convert Shapely geometries into well-known-binary format\n", "data['geometry'] = data['geometry'].apply(lambda geom: swkb.dumps(geom))\n", "\n", "# Push to database (overwrite if table exists)\n", "data.to_sql(tbl_name, engine, if_exists='replace', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read Web Feature Service (WFS)\n", "\n", "This script was used to generate input data for this tutorial (FileGDB and tab were created separately). Source: Statistics finland WFS." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import geopandas as gpd\n", "import requests\n", "import geojson\n", "from pyproj import CRS\n", "\n", "# Specify the url for the backend. \n", "#Here we are using data from Statistics Finland: https://www.stat.fi/org/avoindata/paikkatietoaineistot_en.html. (CC BY 4.0)\n", "url = 'http://geo.stat.fi/geoserver/tilastointialueet/wfs'\n", "\n", "# Specify parameters (read data in json format). \n", "params = dict(service='WFS', version='2.0.0', request='GetFeature', \n", " typeName='tilastointialueet:kunta4500k', outputFormat='json')\n", "\n", "# Fetch data from WFS using requests\n", "r = requests.get(url, params=params)\n", "\n", "# Create GeoDataFrame from geojson and set coordinate reference system\n", "data = gpd.GeoDataFrame.from_features(geojson.loads(r.content), crs=\"EPSG:3067\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geometrykuntavuosiniminamnnamebbox
0POLYGON ((366787.924 7001300.583, 362458.797 6...0052020AlajärviAlajärviAlajärvi[321987.07200161, 6959704.55099558, 366787.924...
1POLYGON ((382543.364 7120022.976, 372645.944 7...0092020AlavieskaAlavieskaAlavieska[360962.99200022, 7104339.03799839, 382543.364...
2POLYGON ((343298.204 6961570.195, 345569.224 6...0102020AlavusAlavusAlavus[303353.32000378, 6922242.40698068, 345569.224...
3POLYGON ((436139.680 6798279.085, 435912.756 6...0162020AsikkalaAsikkalaAsikkala[403543.81899999, 6774122.31100019, 442401.762...
4POLYGON ((426631.036 6720528.076, 432565.266 6...0182020AskolaAskolaAskola[413073.96299999, 6704555.87800016, 435459.201...
\n", "
" ], "text/plain": [ " geometry kunta vuosi nimi \\\n", "0 POLYGON ((366787.924 7001300.583, 362458.797 6... 005 2020 Alajärvi \n", "1 POLYGON ((382543.364 7120022.976, 372645.944 7... 009 2020 Alavieska \n", "2 POLYGON ((343298.204 6961570.195, 345569.224 6... 010 2020 Alavus \n", "3 POLYGON ((436139.680 6798279.085, 435912.756 6... 016 2020 Asikkala \n", "4 POLYGON ((426631.036 6720528.076, 432565.266 6... 018 2020 Askola \n", "\n", " namn name bbox \n", "0 Alajärvi Alajärvi [321987.07200161, 6959704.55099558, 366787.924... \n", "1 Alavieska Alavieska [360962.99200022, 7104339.03799839, 382543.364... \n", "2 Alavus Alavus [303353.32000378, 6922242.40698068, 345569.224... \n", "3 Asikkala Asikkala [403543.81899999, 6774122.31100019, 442401.762... \n", "4 Askola Askola [413073.96299999, 6704555.87800016, 435459.201... " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Prepare data for writing to various file formats\n", "data = data.drop(columns=[\"bbox\"])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "Name: ETRS89 / TM35FIN(E,N)\n", "Axis Info [cartesian]:\n", "- E[east]: Easting (metre)\n", "- N[north]: Northing (metre)\n", "Area of Use:\n", "- name: Finland\n", "- bounds: (19.08, 58.84, 31.59, 70.09)\n", "Coordinate Operation:\n", "- name: TM35FIN\n", "- method: Transverse Mercator\n", "Datum: European Terrestrial Reference System 1989\n", "- Ellipsoid: GRS 1980\n", "- Prime Meridian: Greenwich" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check crs\n", "data.crs" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created file data/finland_municipalities.shp\n", "Created file data/finland_municipalities.gjson\n", "Created file data/finland_municipalities.kml\n", "Created file data/finland_municipalities.gpkg\n" ] } ], "source": [ "# filename\n", "layer_name = \"finland_municipalities\"\n", "\n", "# enable writing kml\n", "gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'\n", "\n", "# drivers and extensions for different file formats\n", "drivers = {'ESRI Shapefile': 'shp',\n", " 'GeoJSON': 'gjson',\n", " 'KML': 'kml',\n", " 'GPKG': 'gpkg',\n", " }\n", "\n", "# Write layer to different file formats\n", "for driver, extension in drivers.items():\n", " \n", " # Create file path and file name\n", " file_name = \"data/{0}.{1}\".format(layer_name, extension)\n", " \n", " # Write data using correct dricer\n", " data.to_file(file_name, driver=driver)\n", " print(\"Created file\", file_name)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }