Processing and Analysis of Raster Data#
In the first part of our lesson, we reviewed the basics of raster files and explored how to read and work with both single-band and multiband raster files, as well as retrieve raster images via WMS. Now, in the second part of the lesson, we will focus on how to process and analyze raster data. Specifically, we will cover:
Merging multiple raster files to create a
raster mosaic
Clipping rasters using a polygon
Reclassifying raster data
Performing
slope
analysis
For this part of the lesson, we will continue using the elevation model data provided by the National Land Survey of Finland. This time, we will work with all four raster tiles (L4133A, L4133B, L4133C, L4133D), which cover parts of Helsinki’s city center. These tiles have already been downloaded to the data
directory and will be used in this lesson.
Now let’s read all four raster tiles and visualize them using subplots.
[1]:
import rioxarray
import matplotlib.pyplot as plt
# Load the rasters and reproject them to EPSG:3067
raster1 = rioxarray.open_rasterio('data/L4133A.tif').rio.reproject("EPSG:3067")
raster2 = rioxarray.open_rasterio('data/L4133B.tif').rio.reproject("EPSG:3067")
raster3 = rioxarray.open_rasterio('data/L4133C.tif').rio.reproject("EPSG:3067")
raster4 = rioxarray.open_rasterio('data/L4133D.tif').rio.reproject("EPSG:3067")
# Create a subplot with 2x2 grid
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Plot each raster in a separate subplot
raster1.plot(ax=axes[0, 0], cmap='viridis')
axes[0, 0].set_title('Raster 1 (L4133A)')
raster2.plot(ax=axes[0, 1], cmap='viridis')
axes[0, 1].set_title('Raster 2 (L4133B)')
raster3.plot(ax=axes[1, 0], cmap='viridis')
axes[1, 0].set_title('Raster 3 (L4133C)')
raster4.plot(ax=axes[1, 1], cmap='viridis')
axes[1, 1].set_title('Raster 4 (L4133D)')
# Adjust layout
plt.tight_layout()
plt.show()

Note: We are again using matplotlib for visualization of our raster datasets. Everything we’ve previously learned about visualization with matplotlib and styling still applies here. We have extensively used matplotlib during the GeoPython Lesson 7 and throughout this course, especially in Lesson 5 where we focused on creating static maps.
As you can see, the four raster tiles cover adjacent areas in the Helsinki city center. However, they are still separate files! In the next step, we will merge these tiles into a single raster mosaic, creating a seamless representation that covers the entire central area of Helsinki.
Create raster mosaic#
Raster files are usualy large in size, therefor it is quite common for providers to publish them in smaller pieces or tiles. While this makes it easier to transfer the data, it may not be so practical when it comes to the actual analysis. For example, as seen above, the elevation model from center of Helsinki is divided into 4 separate raster files.
Now we want to merge multiple raster files together and create a raster mosaic. This can be done with the merge_arrays()
-function in rioxarray
.
Aligning raster tiles In our case, the raster files are coming from the same source, and are actually pieces of a larger data. If that was not the case, it would have been a good idea to start by aligning our raster tiles, ensuring that they can be combined without alignment problems. The
Align
function helps align your raster files in terms of resolution, projection, and spatial extent (docs.xarray).import xarray as xr # Ensure all rasters have the same CRS assert raster1.rio.crs == raster2.rio.crs == raster3.rio.crs == raster4.rio.crs, "Rasters have different CRS" # Align the rasters to the same resolution and spatial extent raster1, raster2 = xr.align(raster1, raster2) raster3, raster4 = xr.align(raster3, raster4)
Now let’s merge our data:
[2]:
from rioxarray.merge import merge_arrays
# Merge the four raster data into one
mosaic_merged = merge_arrays([raster1, raster2, raster3, raster4])
# Save the merged raster (optional)
# mosaic_merged.rio.to_raster('merged_raster.tif')
# Plot the mosaic raster
mosaic_merged.plot()
[2]:
<matplotlib.collections.QuadMesh at 0x2308d67e9d0>

Clipping raster using a polygon#
Clipping a raster using vector data (i.e., a polygon) is another common operation with raster data. In this part of lesson, we use wfs to get the administrative boundaries in Helsinki area.
[3]:
from owslib.wfs import WebFeatureService
import geopandas as gpd
# Define the bounding box for Helsinki (EPSG:3879)
bbox_helsinki = "25400000,6670000,25500000,6680000"
# Fetch the Paavo data limited to the Helsinki area using WFS
paavo_data_helsinki = gpd.read_file(
"https://geo.stat.fi/geoserver/wfs"
"?service=wfs"
"&version=2.0.0"
"&request=GetFeature"
"&typeName=postialue:pno" # Adjust to the correct layer if needed
"&srsName=EPSG:3879"
f"&bbox={bbox_helsinki},EPSG:3879"
)
# Display the first few rows of the data
paavo_data_helsinki.head()
# Save to a file if needed
# paavo_data_helsinki.to_file("paavo_data_helsinki.gpkg", driver="GPKG")
[3]:
gml_id | objectid | posti_alue | vuosi | nimi | namn | kunta | kuntanro | pinta_ala | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | pno.1 | 1 | 00100 | 2024 | Helsinki keskusta - Etu-Töölö | Helsingfors centrum - Främre Tölö | 091 | 91 | 2353278 | POLYGON ((25496638.898 6672477.679, 25496762.1... |
1 | pno.2 | 2 | 00120 | 2024 | Punavuori - Bulevardi | Rödbergen - Bulevarden | 091 | 91 | 414010 | POLYGON ((25496316.740 6671953.498, 25496387.6... |
2 | pno.3 | 3 | 00130 | 2024 | Kaartinkaupunki | Gardesstaden | 091 | 91 | 428960 | POLYGON ((25497213.040 6671964.915, 25497297.9... |
3 | pno.5 | 5 | 00150 | 2024 | Punavuori - Eira - Hernesaari | Rödbergen - Eira - Ärtholmen | 091 | 91 | 1367328 | MULTIPOLYGON (((25495892.616 6670428.430, 2549... |
4 | pno.32 | 32 | 00420 | 2024 | Kannelmäki | Gamlas | 091 | 91 | 2890057 | POLYGON ((25493298.180 6682931.149, 25493149.3... |
Now we can use the contents of the column posti_alue to select a ppostal area of our choice. Let’s try Kamppi; we know that the postal code for Kamppi is 00100
.
[4]:
# Select Kamppi polygon based on postal code
kamppi = paavo_data_helsinki[paavo_data_helsinki['posti_alue'] == '00100']
# Transfrom the polygon to the same CRS as our raster data
kamppi = kamppi.to_crs(mosaic_merged.rio.crs)
kamppi.plot()
[4]:
<Axes: >

Note: Similar to other map overlay analyses, it is essential to ensure that the CRS (Coordinate Reference System) of the raster files and the clipping feature are matching.
Once again, let’s make sure that the CRS are matching and then let’s proceed with the clipping.
[5]:
# Double check that the CRS match
assert raster1.rio.crs == kamppi.crs , "CRS Mismatch"
# Clip the raster using the Kamppi polygon
clipped_mosaic = mosaic_merged.rio.clip(kamppi.geometry, kamppi.crs)
# Save the clipped raster (optional)
# clipped_mosaic.rio.to_raster("clipped_mosaic_kamppi.tif")
clipped_mosaic.plot()
[5]:
<matplotlib.collections.QuadMesh at 0x23096825a90>

[6]:
nodata_value = clipped_mosaic.rio.nodata
# Mask the NoData values
clipped_raster = clipped_mosaic.where(clipped_mosaic != nodata_value)
clipped_raster.plot()
[6]:
<matplotlib.collections.QuadMesh at 0x2309c0cb250>

The function
np.where
in NumPyThe ``np.where`` function in NumPy is a conditional function that allows you to select elements from an array based on a condition. It can be thought of as an element-wise if-else for arrays.
Basic Syntax
np.where(condition, x, y)
``condition``: A boolean array or condition that’s checked for each element.
``x``: The value (or array) to use when the condition is
True
.``y``: The value (or array) to use when the condition is
False
.For each element, if the
condition
isTrue
,np.where
returns the corresponding value fromx
; otherwise, it returns the corresponding value fromy
.Example
Here’s a simple example to clarify how it works:
import numpy as np arr = np.array([1, -2, 3, -4, 5]) result = np.where(arr > 0, arr, 0) print(result)Output:
[1, 0, 3, 0, 5]In this case: - The condition
arr > 0
checks if each element is positive. - Where the condition isTrue
(for positive numbers),np.where
returns the originalarr
value. - Where the condition isFalse
(for negative numbers), it returns0
.Practical Uses
np.where
is often used for: - Masking values in an array (e.g., setting invalid values to NaN). - Conditional selection in data processing and filtering. - Replacing values in an array based on certain conditions.
Raster reclassification#
Raster reclassification is another common procedure with raster dataset. Raster reclassification is the process of assigning new values to the pixels of a raster dataset based on their existing values. This is often used to simplify or categorize continuous data, such as elevation or land cover, into distinct classes. For example, in an elevation map, reclassification can group different elevation ranges into categories like “low”, “medium”, and “high” altitude, making it easier to analyze or visualize the data for specific applications such as hazard assessment or land management.
let’s first have a look at the range of values (altitudes) in our raster data:
[7]:
# Check the data range
print(clipped_raster.min().values, clipped_raster.max().values)
-1.829 31.834
Now we want to reclassify our data using two approaches. First, let’s do it manually using the numpy
library. NumPy is a powerful Python library used for numerical and scientific computing. It provides support for large, multi-dimensional arrays and matrices, along with a wide collection of mathematical functions to operate on these arrays efficiently. So basically here, we are treating our pixel values as an Array
and do the calculations accordingly.
Note: An array (such as those created by NumPy) is a multi-dimensional container for data, but it lacks metadata like coordinate labels or attributes. In contrast, an xarray.DataArray enhances the array by adding labeled dimensions, coordinates, and attributes, making it easier to work with multi-dimensional data, especially in geospatial and time-series contexts.
[8]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
# Define the bins and new class values
bins = [-10, 10, 20, np.inf] # Bins for elevation
new_values = [1, 2, 3] # Class values: 1 for Low, 2 for Mid, 3 for High
# Mask out NaN values before reclassification
masked_raster = np.where(~np.isnan(clipped_raster), clipped_raster, np.nan)
# Apply the reclassification
reclassified_raster = np.digitize(masked_raster, bins, right=True)
# Retain NaN values by ensuring they are not reclassified
reclassified_raster = np.where(~np.isnan(clipped_raster), reclassified_raster, np.nan)
# Convert to an xarray DataArray
reclassified_raster = xr.DataArray(
reclassified_raster,
dims=clipped_raster.dims, # Keep the same dimensions
coords=clipped_raster.coords, # Retain the spatial coordinates
attrs=clipped_raster.attrs # Preserve the original attributes
)
# Plot using xarray's plot method
plot = reclassified_raster.plot(cmap=plt.matplotlib.colors.ListedColormap(['blue', 'green', 'brown']))
# Set only the ticks [1, 2, 3] on the colorbar
colorbar = plot.colorbar
colorbar.set_ticks([1, 2, 3])
colorbar.set_ticklabels(['Low', 'Mid', 'High']) # Rename the labels
colorbar.set_label("Elevation class")
plt.title("Classified elevation")
plt.show()

Note: When using
np.digitize()
, the function returns indices of the bins, starting from1
. This means the output will not directly contain the desired reclassified values. To map the bin indices to the actual values, you need to create a mapping, such as using a list comprehension to convert the indices into the specified new values. Don’t forget to subtract1
from the indices because Python lists are zero-indexed, whilenp.digitize()
starts indexing at1
.
[9]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
# Define the bins and new class values
bins = [-10, 10, 20, np.inf] # Bins for elevation
new_values = [4, 5, 7] # Class values: 1 for Low, 2 for Mid, 3 for High
# Mask out NaN values before reclassification
masked_raster = np.where(~np.isnan(clipped_raster), clipped_raster, np.nan)
# Apply the reclassification
reclassified_raster = np.digitize(masked_raster, bins, right=True)
# Retain NaN values by ensuring they are not reclassified
reclassified_raster = np.where(~np.isnan(clipped_raster), reclassified_raster, np.nan)
# Convert to an xarray DataArray
reclassified_raster = xr.DataArray(
reclassified_raster,
dims=clipped_raster.dims, # Keep the same dimensions
coords=clipped_raster.coords, # Retain the spatial coordinates
attrs=clipped_raster.attrs # Preserve the original attributes
)
# Plot using xarray's plot method
plot = reclassified_raster.plot(cmap=plt.matplotlib.colors.ListedColormap(['blue', 'green', 'brown']))
# Set only the ticks [1, 2, 3] on the colorbar
colorbar = plot.colorbar
colorbar.set_ticks([1, 2, 3])
#colorbar.set_ticklabels(['Low', 'Mid', 'High']) # Rename the labels
colorbar.set_label("Elevation class")
plt.title("Classified elevation")
plt.show()

We can use the mapclassify
library which we learned about in Lesson 4, to reclassify our raster. Let’s Try NaturalBreaks method here.
[10]:
import numpy as np
import xarray as xr
import mapclassify
# Flatten the raster data and store the original shape
raster_data_flattened = clipped_raster.values.flatten()
original_shape = clipped_raster.shape
# Create a mask for NaN values and remove NaNs for classification
nan_mask = np.isnan(raster_data_flattened)
raster_data_no_nan = raster_data_flattened[~nan_mask]
# Apply Natural Breaks classification with 7 classes
natural_breaks = mapclassify.NaturalBreaks(raster_data_no_nan, k=7)
# Get the classified bins
breaks = natural_breaks.bins
# Classify the non-NaN raster values
classified_raster = np.digitize(raster_data_no_nan, bins=breaks)
# Create an array to hold the full reclassified data
reclassified_full_raster = np.full_like(raster_data_flattened, np.nan)
# Insert the classified data back, keeping NaNs in place
reclassified_full_raster[~nan_mask] = classified_raster
# Reshape the reclassified raster to its original shape
reclassified_full_raster = reclassified_full_raster.reshape(original_shape)
# Convert to xarray DataArray
reclassified_raster_nb = xr.DataArray(
reclassified_full_raster,
dims=clipped_raster.dims,
coords=clipped_raster.coords,
attrs=clipped_raster.attrs
)
# Plot the reclassified raster
reclassified_raster_nb.plot()
[10]:
<matplotlib.collections.QuadMesh at 0x230eae83210>

Raster reclassification and NaN values When reclassifying raster data, it’s crucial to handle null values (NaNs) carefully. If not properly masked, NaN values can be incorrectly classified, leading to erroneous results. Always ensure that NaN values are preserved or treated appropriately to avoid classification issues and maintain accurate analysis.
Slope analysis#
Slope analysis is a key terrain analysis technique used in GIS and spatial analysis to measure the steepness or incline of the terrain at any given point. It is calculated by examining the rate of change in elevation between neighboring pixels in a digital elevation model (DEM). Slope analysis helps identify areas with steep gradients, which is useful in applications like land-use planning, erosion risk assessment, hydrological modeling, and infrastructure development. The slope is typically expressed in degrees or as a percentage.
Of course, we can perform slope analysis anywhere, including on the Helsinki city center raster we worked with earlier. However, for this course’s purpose, it will be more interesting to calculate it for an area with more elevation variation. Since Finland is generally quite flat, we need to look further north to find terrain that is more elevation-wise interesting for slope analysis. For this purpose, we are heading to Ylläs in Lapland, located in the northern part of Finland. Ylläs is a fell, or mountainous hill, and offers more diverse elevation changes, making it ideal for our slope analysis!
About Ylläs
Ylläs is one of the highest fells in Finland, standing at 718 meters. It is located in the Pallas-Yllästunturi National Park and is a popular destination for outdoor activities such as hiking and skiing. Ylläs is known for its stunning natural landscapes, including vast forests, pristine lakes, and dramatic fells that rise above the surrounding terrain.
Image Source: Wikipedia - Ylläs - Metsähallitus
The raster we will be working with is from the same source as the previous ones, provided by the National Land Survey of Finland, but from a different tile (U4234
). We will calculate the slope using our clipped elevation raster. The slope at each point is derived by determining the rate of change in elevation between neighboring cells in the raster. This is done
by:
Gradient Calculation: We use the elevation differences between adjacent cells in both the x (horizontal) and y (vertical) directions. The gradient represents how quickly elevation changes in these directions.
Slope Formula: The slope is calculated by combining these gradients using the Pythagorean theorem:
\[\text{slope} = \sqrt{\left(\frac{\Delta z}{\Delta x}\right)^2 + \left(\frac{\Delta z}{\Delta y}\right)^2}\]
where Delta z is the change in elevation, and Delta x and Delta y are the distances between the cells in the x and y directions.
Final Slope Values: The result is often expressed in degrees or as a percentage, with steeper areas showing higher slope values. We use degrees here.
[11]:
yllas_dem = rioxarray.open_rasterio('data/U4234A.tif')
yllas_dem.plot()
[11]:
<matplotlib.collections.QuadMesh at 0x230eac6f590>

[12]:
# Get the pixel resolution
xres = yllas_dem.rio.resolution()[0] # Resolution in x direction (longitude)
yres = yllas_dem.rio.resolution()[1] # Resolution in y direction (latitude)
# Calculate gradients in the x and y directions
dzdx = yllas_dem.differentiate(coord='x') / xres # Gradient in the x direction
dzdy = yllas_dem.differentiate(coord='y') / yres # Gradient in the y direction
# Calculate the slope (in degrees)
slope = np.sqrt(dzdx**2 + dzdy**2)
slope = np.arctan(slope) * (180 / np.pi)
# Update the attributes to reflect that this is a slope raster
slope.attrs['long_name'] = 'Slope'
slope.attrs['units'] = 'degrees'
Now let’s plot our Slope raster.
[13]:
# Plot the slope raster
plt.figure(figsize=(10, 8))
slope.plot(cmap='terrain', add_colorbar=True)
plt.title("Slope Map (Degrees)")
plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.show()

Raster-to-Raster Calculations#
Raster-to-raster calculations involve performing mathematical or logical operations on two or more raster datasets to derive new information. Each raster grid cell is processed based on corresponding cells from the input rasters. This method is commonly used in geospatial analysis to perform tasks like:
Combining terrain attributes: Slope, aspect, and DEM can be used to calculate terrain stability or hydrological indices.
Environmental modeling: Calculating indices such as vegetation health (NDVI) using different raster bands.
Suitability analysis: Combining different factors such as elevation and slope to determine land use or habitat suitability.
Example: Slope Stability Index (SSI) SSI is a basic measure to assess the potential stability of a terrain. It combines both the slope steepness and elevation of the terrain, providing an index that indicates the likelihood of a slope to remain stable. Steeper slopes and higher elevations tend to be more prone to instability, which can lead to phenomena such as landslides, rockfalls, or soil erosion.
The Slope Stability Index can be calculated as:
Slope: The steepness of the terrain, typically measured in degrees or percentage.
Elevation (DEM): The height of the terrain above a reference point (usually sea level).
SSI: The lower the value of the SSI, the higher the instability risk. Conversely, higher values indicate more stable terrain.
Now let’s write this in Python:
[16]:
import numpy as np
import rioxarray as rxr
# Read the first band
slope = slope[0]
dem = yllas_dem[0]
# Avoid division by zero by setting zero or negative DEM values to NaN
dem = yllas_dem.where(dem > 0)
# Calculate the Slope Stability Index (SSI)
ssi = 1 / (slope * dem)
# Handle any infinities resulting from division by zero (set them to NaN)
ssi = ssi.where(np.isfinite(ssi), np.nan)
# Update SSI attributes to match input slope raster
ssi.rio.write_crs(slope.rio.crs)
ssi.rio.write_nodata(np.nan, inplace=True)
# Save the result as a GeoTIFF
ssi.plot()
[16]:
<matplotlib.collections.QuadMesh at 0x23144c43010>

Lazy Loading?If data loads poorly or improves on re-run, it’s likely due to lazy loading. Libraries likexarray
defer loading large datasets until explicitly needed, and slow I/O or caching issues can result in incomplete reads. Re-running ensures full loading and processing. To avoid this, use.load()
to force immediate data loading and minimize reliance on deferred operations.