Motivation behind this course#

General overview#

Now that you are familiar with the basics of Python programming, it is time to apply those skills to geographic data analysis. During the ‘Automating GIS processes’ course, you will learn how to handle spatial data and analyse it using Python. In the course, you also will get to know some of the many Python packages that have been written specifically for GIS-related applications.

Learning objectives#

At the end of the course you will be able to:

  • read and write spatial data from and to different file formats,

  • work with coordinate reference systems,

  • use geocoding to convert addresses to coordinates, and vice versa,

  • use overlay functions to combine geometries (intersect, union),

  • reclassify data based on their value,

  • carry out spatial queries,

  • conduct simple spatial analysis, and

  • visualise data and create (interactive) maps, such as following:

Why Python for GIS?#

When it comes to data science, and especially spatial data science and spatial data analysis, Python currently is the most useful programming or scripting language to learn. On the one hand, the Python ecosystem, which allows people to share their libraries easily with others, is very rich, and has packages for almost any task imaginable. Researchers who find that a particular tool does not exist, often go ahead and create it themselves. Python’s packages cover such a broad range of applications, that its import statement has become the trope of many jokes:

I wrote 20 short programs in Python yesterday. It was wonderful. Perl, I'm leaving you.

Learning to do anything is easy in Python: there is a package for everything. Source: xkcd.com/353#

There are more advantages more specifically for geographers and spatial data: most desktop GIS (e.g., QGIS, ESRI ArcGIS) have language bindings or programming interfaces for Python: that means that you can write Python programs that interact with data, libraries, and user interface of these applications, and create custom plugins to extend their functionality. Beyond that, here are powerful Python packages for virtually any task regarding spatial data management and spatial analysis.

During this course, we will focus on the latter: carrying GIS analysis and data management tasks without a desktop GIS, let alone a proprietary third-party software package such as ESRI ArcGIS. Why? There’s more than one reason:

  • Python itself, and most packages for it, is open source. It means it’s free to use (as in free beer), and free to modify for your own purposes (as in free speech). Open source and libre software is inherently democratic: you don’t have to rely on access to a license, and anybody across the globe can use them. Read more about free beer vs. free speech at the Free Software Foundation’s web page

  • Writing a script that solves a problem makes you learn and understand more deeply how geo-processing operations work, and how they work together. In this course, you will gain a deeper understanding of GIS.

  • Python is efficient and scales very well: You can use a Python script to analyse Big Data and other large datasets, and it is likely to outperform any desktop GIS tool.

  • Python is highly flexible: it can read and write almost any data format, and its package ecosystem provides libraries for almost any programming task imaginable.

  • Using Python, or any other open source software, for what it’s worth, supports the ideas of open science and reproducable research. Anyone can take your code and repeat your experiment or analysis to verify your claims and to learn from your example.

  • Combining the functionality of different Python packages you can achieve surprising results with comparably low effort: e.g., building fancy web-GIS applications by chaining GeoDjango with a PostGIS backend.

Which GIS packages are available in Python?#

During the GeoPython course, we have already used a few Python modules for carrying out different tasks, such as numpy for mathematical calculations, and matplotlib for data visualisation. In ’AutoGIS’, we will get to know a range of other Python packages that are useful for data analysis and the handling of geo-spatial data sets.

Below we listed most of the most common Python packages for GIS and data analysis (and links to their documentation). This list helps you get started when you approach a data analysis or GIS problem (but be sure to also use your favourite search engine to see if better alternatives have become available).

Even if you learnt about a package from a blog post, or from this course, always check the package’s own documentation page for its recommended use.

  • Data analysis & visualization:

    • NumPy: Fundamental package for scientific computing with Python

    • Pandas: High-performance, easy-to-use data structures and data analysis tools

    • SciPy: A collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization and statistics

    • MatplotLib: Basic plotting library for Python

    • Bokeh: Interactive visualizations for the web (also maps)

    • Plotly: Interactive visualizations (also maps) for the web (commercial - free for educational purposes)

  • GIS:

    • GDAL: Fundamental package for processing vector and raster data formats (many modules below depend on this). Used for raster processing.

    • Geopandas: Working with geospatial data in Python made easier, combines the capabilities of pandas and shapely.

    • Shapely: Python package for manipulation and analysis of planar geometric objects (based on widely deployed GEOS

    • Fiona: Reading and writing spatial data (alternative for geopandas).

    • Pyproj: Performs cartographic transformations and geodetic computations (based on PROJ.4).

    • Pysal: Library of spatial analysis functions written in Python.

    • Geopy: Geocoding library: coordinates to address <-> address to coordinates.

    • Contextily: Add background basemaps for your (static) map visualizations

    • GeoViews: Interactive Maps for the web.

    • Geoplot: High-level geospatial data visualization library for Python.

    • Dash: Dash is a Python framework for building analytical web applications.

    • OSMnx: Python for street networks. Retrieve, construct, analyze, and visualize street networks from OpenStreetMap

    • Networkx: Network analysis and routing in Python (e.g. Dijkstra and A* -algorithms), see this post.

    • Cartopy: Make drawing maps for data analysis and visualisation as easy as possible.

    • Scipy.spatial: Spatial algorithms and data structures.

    • Rtree: Spatial indexing for Python for quick spatial lookups.

    • Rasterio: Clean and fast and geospatial raster I/O for Python.

    • RSGISLib: Remote Sensing and GIS Software Library for Python.

Tip

If you look through the list of links above, you will notice that many projects host their documentation on readthedocs.io (RTD), more specifically at the URL https://{PROJECT_NAME}.readthedocs.io/. RTD has become the de-facto standard service for hosting documentation for Python packages; if you are looking for documentation for a Python package, quickly trying its (assumed) readthedocs address often works.

In the Python ecosystem, there are tools and libraries for almost any GIS-related task. Source: Tenkanen, 2022.

Note

If you want to install Python and Python packages on your own computer, see the instructions on the course information pages