Python matplotlib example shapefile basics#

Description

The following example gives a brief overview of how to easily read in a shapefile using the GeoPandas software package and use it for visualization.

The used shapefile from data.dtu.dk contains the border lines of the European countries.

Learning steps:

  • read the data into a GeoPandas DataFrame

  • have a closer look at the content
    • country (DataFrame row), abbreviation and name (DataFrame columns)

    • country area (DataFrame geometry, polygons)

    • country borderline (DataFrame geometry, polylines)

  • extract the data for a single country

  • extract the data for multiple countries

  • plot geometries

  • combine GeoPandas plotting routine with Matplotlib and Cartopy
    • draw country borderline over data (overlay) on a map

Software requirements

  • Python 3

  • geopandas

  • matplotlib

  • cartopy

  • xarray

Run the script

python shapefile_basics.py

Notebook shapefile_basics.ipynb


shapefile_basics.py:

#!/usr/bin/env python
# coding: utf-8

# DKRZ Tutorial:  Shapefiles
#
# ----
#
# Copyright 2024 Deutsches Klimarechenzentrum GmbH (DKRZ)
# Licensed under CC-BY-NC-SA-4.0
#
# ----
#
# The following example gives a brief overview of how to easily read in a
# **shapefile** using the **GeoPandas** software package and use it for
# visualization.
#
#
# The used shapefile from data.dtu.dk contains the border lines of the
# European countries.
#
# Learning steps:
#
# - read the data into a GeoPandas DataFrame
# - have a closer look at the content
#     - country (DataFrame row), abbreviation and name (DataFrame columns)
#     - country area (DataFrame geometry, polygons)
#     - country borderline (DataFrame geometry, polylines)
# - extract the data for a single country
# - extract the data for multiple countries
# - plot geometries
# - combine GeoPandas plotting routine with Matplotlib and Cartopy
#     - draw country borderline on data (overlay)
#
# **GeoPandas**
#
# URL: https://geopandas.org/en/stable/
#
# **Shapefile description**
#
# From the _ESRI Shapefile Technical Description_ <br>
# https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf
#
#   A shapefile stores nontopological geometry and attribute information for
#   the spatial features in a data set. The geometry for a feature is stored
#   as a shape comprising a set of vector coordinates.
#
#   Because shapefiles do not have the processing overhead of a topological
#   data structure, they have advantages over other data sources such as faster
#   drawing speed and edit ability. Shapefiles handle single features that
#   overlap or that are noncontiguous. They also typically require less disk
#   space and are easier to read and write.
#
#   Shapefiles can support point, line, and area features. Area features are
#   represented as closed loop, double-digitized polygons. Attributes are held
#   in a dBASE®  format file.
#   Each attribute record has a one-to-one relationship with the associated
#   shape record.
#
# **Download shapefile**
#
# Shapefile download page: <br>
# https://data.dtu.dk/articles/dataset/Shapefile_of_European_countries/23686383
#
# Used shapefile data set: <br>
# https://data.dtu.dk/ndownloader/articles/23686383/versions/1
#   Sevdari, Kristian; Marmullaku, Drin (2023). Shapefile of European countries.
#   Technical University of Denmark. Dataset. https://doi.org/10.11583/DTU.23686383.v1
#
# ----

import os
from datetime import datetime
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import cartopy.crs as ccrs
import cartopy.feature as cfeature

# Load the shapefile
#
# 1. Download the shapefile from https://data.dtu.dk/ndownloader/articles/23686383/versions/1
#
# 2. Load the shapefile with GeoPandas `read_file()` function.

shapefile_path = os.environ['HOME']+ '/Downloads/23686383/Europe/Europe_merged.shp'

gdf = gpd.read_file(shapefile_path)

# Shapefile content
#
# Print the first 5 lines of the shapefile content using the `.head()` method.

gdf.head()

# Shape, rows, and columns
#
# The readonly property `shape` gives us the (nrows, ncols) size of the
# DataFrame.

print(gdf.shape)

# Number of rows

nrows = gdf.shape[0]
print(nrows)

# same as above
nrows = len(gdf.index)
print(nrows)

# Number of columns

ncols = gdf.shape[1]
print(ncols)

# same as above
ncols = len(gdf.columns)
print(ncols)

# Each row in the GeoPandas DataFrame contains the data of one country.
#
# Select a row by index

print(gdf.iloc[10])
print('----------------------')
print(gdf.iloc[10].GID_0)
print(gdf.iloc[10].COUNTRY)
print(gdf.iloc[10].name)

# To print the column names use the `columns` method.

print(gdf.columns.values)

# Select a row (= country) by its name, therefore, we use the column COUNTRY
# for the condition.

print(gdf.loc[gdf['COUNTRY'] == 'Germany'])

# Bounds
#
# Print the bounds of the first 5 countries.

print(gdf.bounds.head())

# Print bounds range of the shapefile content.

print(f'minx: {min(gdf.bounds.minx):8.3f}  maxx: {max(gdf.bounds.maxx):8.3f}')
print(f'miny: {min(gdf.bounds.miny):8.3f}  maxy: {max(gdf.bounds.maxy):8.3f}')

# Plot shapefile content
#
# The GeoPandas package provides an interface to Matplotlib for drawing.
# Therefore, we use the GeoPandas function `.plot()` to plot the content of
# the GeoPandas data frame. See also https://geopandas.org/en/stable/docs/user_guide/mapping.html.

gdf.plot()
plt.show()

# The plot above shows the drawings of the multi-polygons of the countries
# (default). If you want to draw only the outlines, the border lines of the
# countries, you can use the `.boundary` method with the `.plot()` method.

gdf.boundary.plot(linewidth=0.5)
plt.show()

# Draw each country using the columns 'GID_0' or 'COUNTRY'. To add a legend
# to the plot the parameter column has to be given to the plot call. GID_0 for
# the country abbreviation or COUNTRY for the standard name.

legend_kw = dict(loc='center left', bbox_to_anchor=(1,0.5), fontsize=6, ncols=3)

gdf.plot(column='COUNTRY', cmap='gist_rainbow', figsize=(12, 5),
         legend=True, legend_kwds=legend_kw) # standard names

#gdf.plot(column='GID_0', cmap='gist_rainbow', figsize=(12, 5),
#         legend=True, legend_kwds=legend_kw)   # abbreviations
plt.show()

# Extract countries
#
# We can extract a single country or a set of countries as shown above with
# the `loc` and `iloc` methods, but there is a third method called `where`
# that can do it as well.
#
# Using `where` method

gdf.where(gdf['COUNTRY'] == 'Germany' ).plot()
plt.show()

# Here, we use the `loc` method of the GeoPandas DataFrame to extract the
# geometries of Germany, and assign it to a variable.

D = gdf.loc[gdf['COUNTRY'] == 'Germany']

print(D)

# Extract multiple countries using `query()` method
#
# The `query()` method allows us to pass a list of country names for the
# selection.

country_list = ['Germany', 'France']

countries = gdf.query('COUNTRY in @country_list')

print(countries)

countries.plot()
plt.show()

# Print the index

cindex  = countries.index.values
print(cindex)

# Print the multi-polygon geometries

print(countries.geometry)

# Get the bounds of the two countries, the minimum and maximum of the
# multipolygon coordinate data.

print(countries.bounds)

# Print the polyline data of the multi-polygon.

print(countries.boundary)

countries.boundary.plot(lw=0.5)
plt.show()

# Shapefile projection
#
# In most cases the geometries of a shapefile based on a projection. We can
# get the stored projection with the `crs` property.

print(countries.crs)

# Plotting with Matplotlib
#
# Sometimes you'll want to have more controll on the plot creation and that
# is why we here combine the plot functionalities of GeoPandas, Matplotlib,
# and Cartopy.
#
# First, we select each geometry for the two countries Germany and France.
# We do this in order to have better control over the individual plots of the
# countries, e.g. the choice of colors, hatching or the respective labels for
# the legend.

D = gdf.loc[gdf['COUNTRY'] == 'Germany']
F = gdf.loc[gdf['COUNTRY'] == 'France']

# In the next step, we
#
# - draw the two countries on a map,
# - zoom into the map,
# - add a legend to the right side of the plot

proj = ccrs.PlateCarree()

fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
                       layout='tight')

ax.set_extent([-6., 16., 40., 57.])
ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')

style_kwds = dict(color='gold', edgecolor='red', hatch='///', linewidth=0.6,)

D.plot(ax=ax, **style_kwds, transform=ccrs.PlateCarree())

style_kwds = dict(color='blue', edgecolor='red', linewidth=0.6,)

F.plot(ax=ax, **style_kwds, transform=ccrs.PlateCarree())

legend_labels = ['Germany (D)', 'France (F)']

legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor="gold")
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor="blue")
legend_marker = (legend_item1, legend_item2)

plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)

ax.add_feature(cfeature.LAND.with_scale('50m'), color='whitesmoke', zorder=0)
ax.add_feature(cfeature.OCEAN.with_scale('50m'), color='lightgray', zorder=0);

plt.show()

# Re-project to Orthographic projection
#
# To plot the pre-projected shapefile content correctly, we have to re-project
# it to the target projection, here Orthographic. Therefore, we can use the
# `to_crs()` method of the GeoPandas GeoDataFrame with the PROJ string from
# Cartopy's CRS of the target projection.
#
# Target projection

proj = ccrs.Orthographic(central_longitude=0., central_latitude=52.)

# Generate PROJ string from Cartopy CRS

proj_string = proj.proj4_init

print(proj_string)

# Re-project geometries of the GeoPandas GeoDataFrames D and F.

D_ortho = D.to_crs(proj_string)
F_ortho = F.to_crs(proj_string)

# **Note:**
# Due to the fact that we re-project the shapefile geometries for the selected
# countries, we do not need to use the transform parameter in the plot call.
#
# In the next step, we
#
# - draw the two countries on a map with Orthographic projection,
# - zoom into the map,
# - choose different colors and hatching,
# - add a legend to the right side of the plot

start_time = datetime.now()

fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
                       layout='tight')

ax.set_extent([-30., 40., 20., 75.])

ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax.coastlines(resolution='50m', linewidth=0.6, edgecolor='black')

style_kw = dict(color='gold', edgecolor='red', hatch='///', linewidth=0.6,)

D_ortho.plot(ax=ax, **style_kw)

style_kw = dict(color='blue', edgecolor='red', linewidth=0.6,)

F_ortho.plot(ax=ax, **style_kw)

legend_labels = ['Germany (D)', 'France (F)']
legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor='gold')
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor='blue')
legend_marker = (legend_item1, legend_item2)
plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)

print(f'run time: {datetime.now() - start_time}\n')

plt.show()

# Make it faster with Cartopy's `ax.add_geometries()`
#
# A faster way to generate the plot of the map with the shapefile geometries
# is to use Cartopy's `ax.add_geometries()` method instead of geoDataFrame.plot().

start_time = datetime.now()

fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
                       layout='tight')

ax.set_extent([-30., 40., 20., 75.])

ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax.coastlines(resolution='50m', linewidth=0.6, edgecolor='black')

style_kw = dict(facecolor='gold', edgecolor='red', hatch='///', linewidth=0.6,)

ax.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)

style_kw = dict(facecolor='blue', edgecolor='red', linewidth=0.6,)

ax.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)

legend_labels = ['Germany (D)', 'France (F)']
legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor='gold')
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor='blue')
legend_marker = (legend_item1, legend_item2)
plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)

print(f'run time: {datetime.now() - start_time}\n')

plt.show()

# Overlay, draw borderlines over data
#
# For the sake of simplicity, we use an example data set provided by the
# **Xarray** package. See also https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html
#
# The data set we use here contains the variable **rasm** which is the output
# of the Regional Arctic System Model (RASM).
#
# **Note:**
# The variable **Tair** has the coordinates (time,y,x) where x and y are the
# logical coordinates that point to the physical coordinate variables xc and yc.
# The physical coordinates xc,yc are 2-dimensional arrays.

import xarray as xr

ds = xr.tutorial.open_dataset('rasm')

print(ds)

# Select the first time step

var = ds.Tair.isel(time=0)

# Let's have a quick look at the data. Using the plot() method without further
# parameter settings, the plot routine will use the logical coordinates.

var.plot()
plt.show()

# Now, we can use the already known plotting content from above and add the
# data variable using the physical coordinates.
#
# You can use Xarray's or Matplotlib's plot methods. Under the hood, Xarray
# uses the plot functionality of Matplotlib.
#
# In the following we only need the plot commands from above plus a plot
# command for the variable Tair. As there are two ways of plotting the data,
# we create two plots next to each other so that we can compare them more easily.

start_time = datetime.now()

fig, (ax1,ax2) = plt.subplots(nrows=1, ncols=2, figsize=(8,8),
                              subplot_kw={'projection': proj},
                              layout='tight')

#-- use Xarray's plot method (uses Matplotlib under the hood)
ax1.set_extent([-30., 40., 20., 75.])
ax1.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax1.coastlines(resolution='50m', linewidth=0.6, color='blue')

cn1 = var.plot.pcolormesh(ax=ax1, x='xc', y='yc',
                          cmap='RdBu_r',
                          add_colorbar=False,
                          transform=ccrs.PlateCarree())
ax1.set_title('')

#-- use Matplotlib's plot method
ax2.set_extent([-30., 40., 20., 75.])
ax2.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax2.coastlines(resolution='50m', linewidth=0.6, color='blue')

cn2 = ax2.pcolormesh(ds.xc, ds.yc, var,
                     cmap='RdBu_r',
                     transform=ccrs.PlateCarree())

#-- add geometries to both plots
style_kw = dict(facecolor='None', edgecolor='black', linewidth=0.5)

ax1.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)
ax1.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)

ax2.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)
ax2.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)

print(f'run time: {datetime.now() - start_time}\n')

plt.show()

# **Conclusion:** with GeoPandas, shapefiles can be used in a very simple way.

Example plot of European country border lines

image0