Exercise 2#

Important

Please complete this exercise by the end of day on Thursday, 13 November, 2025 (the day before the next work session).

To start this assignment, accept the GitHub classroom assignment, and clone your own repository, e.g., in a CSC Noppe instance. Make sure you commit and push all changes you make (you can revisit instructions on how to use git and the JupyterLab git-plugin on the website of the Geo-Python course.

To preview the exercise without logging in, you can find the open course copy of the course’s GitHub repository at github.com/Automating-GIS-processes-II-2025/Exercise-2. Don’t attempt to commit changes to that repository, but rather work with your personal GitHub classroom copy (see above).

Hints#

Converting a `pandas.DataFrame` into a `geopandas.GeoDataFrame`#

Sometimes, we work with data that are in a non-spatial format (such as Excel or CSV spreadsheets) but contain information on the location of records, for instance, in columns for longitude and latitude values. While geopandas’s read_file() function can read some formats, often, the safest way is to use pandas to read the data set and then convert it to a GeoDataFrame.

Let’s assume, we read the following table using pandas.read_csv() into a variable df:

[1]:

# sample data
import pandas
df = pandas.DataFrame({
    "longitude": [24.9557, 24.8353, 24.9587],
    "latitude": [60.1555, 60.1878, 60.2029]
})

[2]:

df

[2]:

	longitude	latitude
0	24.9557	60.1555
1	24.8353	60.1878
2	24.9587	60.2029

The geopandas.GeoDataFrame() constructor accepts a pandas.DataFrame as an input, but it does not automatically fill the geometry column. However, the library comes with a handy helper function geopandas.points_from_xy(). As we all know, a spatial data set should always have a coordinate reference system (CRS) defined; we can specify the CRS of the input data, here, too:

[3]:

import geopandas

gdf = geopandas.GeoDataFrame(
    df,
    geometry=geopandas.points_from_xy(df.longitude, df.latitude),
    crs="EPSG:4326"
)

gdf

[3]:

	longitude	latitude	geometry
0	24.9557	60.1555	POINT (24.9557 60.1555)
1	24.8353	60.1878	POINT (24.8353 60.1878)
2	24.9587	60.2029	POINT (24.9587 60.2029)

Now, we have a ‘proper‘ GeoDataFrame with which we can do all geospatial operations we would want to do.

Creating a new `geopandas.GeoDataFrame`: alternative 1#

Sometimes, it makes sense to start from scratch with an empty data set and gradually add records. Of course, this is also possible with geopandas’ data frames, that can then be saved as a new geopackage or shapefile.

First, create a completely empty GeoDataFrame:

[4]:

import geopandas

new_geodataframe = geopandas.GeoDataFrame()

Then, create shapely geometry objects and insert them into the data frame. To insert a geometry object into the geometry column, and a name into the name column, in a newly added row, use:

[5]:

import shapely.geometry
polygon = shapely.geometry.Polygon(
    [
        (24.9510, 60.1690),
        (24.9510, 60.1698),
        (24.9536, 60.1698),
        (24.9536, 60.1690)
    ]
)
name = "Senaatintori"

new_geodataframe.loc[
    len(new_geodataframe),  # in which row,
    ["name", "geometry"]    # in which columns to save values
] = [name, polygon]

new_geodataframe.head()

[5]:

	name	geometry
0	Senaatintori	POLYGON ((24.951 60.169, 24.951 60.1698, 24.95...

Before saving the newly created dataset, don’t forget to set the geometry column and define a cartographic reference system for it. Otherwise, you will have trouble reusing the file in other programs:

[6]:

new_geodataframe = new_geodataframe.set_geometry('geometry')
new_geodataframe.crs = "EPSG:4326"

Hint

In the example above, we used len(new_geodataframe) as a row index (which, in a newly created data frame, is equivalent to the row number). Since rows are counted from 0, the number of rows (length of the data frame) is one greater than the address of the last row. This expression, thus, always adds a new row, independent of the actual length of the data frame.

Note that, strictly speaking, the index is independent from the row number, but in newly created data frames, they are identical.

Creating a new `geopandas.GeoDataFrame`: alternative 2#

Often, it is more convenient, and more elegant, to first create a dictionary to collect data, that can then be converted into a data frame all at once.

For this, first define a dict with the column names as keys, and empty lists as values:

[7]:

data = {
    "name": [],
    "geometry": []
}

Then, fill the dict with data:

[8]:

import shapely.geometry

data["name"].append("Senaatintori")
data["geometry"].append(
    shapely.geometry.Polygon(
        [
            (24.9510, 60.1690),
            (24.9510, 60.1698),
            (24.9536, 60.1698),
            (24.9536, 60.1690)
        ]
    )
)

Finally, use this dictionary as input for a new GeoDataFrame. Don’t forget to specify a CRS:

[9]:

new_geodataframe = geopandas.GeoDataFrame(data, crs="EPSG:4326")
new_geodataframe

[9]:

	name	geometry
0	Senaatintori	POLYGON ((24.951 60.169, 24.951 60.1698, 24.95...

Note

These two approaches result in identical GeoDataFrames. Sometimes, one technique is more convenient than the other. You should always evaluate different ways of solving a problem and find the most appropriate and efficient solution (there is always more than one possible solution).

[ ]:

Exercise 2

Contents

Exercise 2#

Hints#

Converting a pandas.DataFrame into a geopandas.GeoDataFrame#

Creating a new geopandas.GeoDataFrame: alternative 1#

Creating a new geopandas.GeoDataFrame: alternative 2#

Converting a `pandas.DataFrame` into a `geopandas.GeoDataFrame`#

Creating a new `geopandas.GeoDataFrame`: alternative 1#

Creating a new `geopandas.GeoDataFrame`: alternative 2#