Python matplotlib example shapefile basics#
Description
The following example gives a brief overview of how to easily read in a shapefile using the GeoPandas software package and use it for visualization.
The used shapefile from data.dtu.dk contains the border lines of the European countries.
Learning steps:
read the data into a GeoPandas DataFrame
- have a closer look at the content
country (DataFrame row), abbreviation and name (DataFrame columns)
country area (DataFrame geometry, polygons)
country borderline (DataFrame geometry, polylines)
extract the data for a single country
extract the data for multiple countries
plot geometries
- combine GeoPandas plotting routine with Matplotlib and Cartopy
draw country borderline over data (overlay) on a map
Software requirements
Python 3
geopandas
matplotlib
cartopy
xarray
Run the script
python shapefile_basics.py
Notebook shapefile_basics.ipynb
shapefile_basics.py:
#!/usr/bin/env python
# coding: utf-8
# DKRZ Tutorial: Shapefiles
#
# ----
#
# Copyright 2024 Deutsches Klimarechenzentrum GmbH (DKRZ)
# Licensed under CC-BY-NC-SA-4.0
#
# ----
#
# The following example gives a brief overview of how to easily read in a
# **shapefile** using the **GeoPandas** software package and use it for
# visualization.
#
#
# The used shapefile from data.dtu.dk contains the border lines of the
# European countries.
#
# Learning steps:
#
# - read the data into a GeoPandas DataFrame
# - have a closer look at the content
# - country (DataFrame row), abbreviation and name (DataFrame columns)
# - country area (DataFrame geometry, polygons)
# - country borderline (DataFrame geometry, polylines)
# - extract the data for a single country
# - extract the data for multiple countries
# - plot geometries
# - combine GeoPandas plotting routine with Matplotlib and Cartopy
# - draw country borderline on data (overlay)
#
# **GeoPandas**
#
# URL: https://geopandas.org/en/stable/
#
# **Shapefile description**
#
# From the _ESRI Shapefile Technical Description_ <br>
# https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf
#
# A shapefile stores nontopological geometry and attribute information for
# the spatial features in a data set. The geometry for a feature is stored
# as a shape comprising a set of vector coordinates.
#
# Because shapefiles do not have the processing overhead of a topological
# data structure, they have advantages over other data sources such as faster
# drawing speed and edit ability. Shapefiles handle single features that
# overlap or that are noncontiguous. They also typically require less disk
# space and are easier to read and write.
#
# Shapefiles can support point, line, and area features. Area features are
# represented as closed loop, double-digitized polygons. Attributes are held
# in a dBASE® format file.
# Each attribute record has a one-to-one relationship with the associated
# shape record.
#
# **Download shapefile**
#
# Shapefile download page: <br>
# https://data.dtu.dk/articles/dataset/Shapefile_of_European_countries/23686383
#
# Used shapefile data set: <br>
# https://data.dtu.dk/ndownloader/articles/23686383/versions/1
# Sevdari, Kristian; Marmullaku, Drin (2023). Shapefile of European countries.
# Technical University of Denmark. Dataset. https://doi.org/10.11583/DTU.23686383.v1
#
# ----
import os
from datetime import datetime
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import cartopy.crs as ccrs
import cartopy.feature as cfeature
# Load the shapefile
#
# 1. Download the shapefile from https://data.dtu.dk/ndownloader/articles/23686383/versions/1
#
# 2. Load the shapefile with GeoPandas `read_file()` function.
shapefile_path = os.environ['HOME']+ '/Downloads/23686383/Europe/Europe_merged.shp'
gdf = gpd.read_file(shapefile_path)
# Shapefile content
#
# Print the first 5 lines of the shapefile content using the `.head()` method.
gdf.head()
# Shape, rows, and columns
#
# The readonly property `shape` gives us the (nrows, ncols) size of the
# DataFrame.
print(gdf.shape)
# Number of rows
nrows = gdf.shape[0]
print(nrows)
# same as above
nrows = len(gdf.index)
print(nrows)
# Number of columns
ncols = gdf.shape[1]
print(ncols)
# same as above
ncols = len(gdf.columns)
print(ncols)
# Each row in the GeoPandas DataFrame contains the data of one country.
#
# Select a row by index
print(gdf.iloc[10])
print('----------------------')
print(gdf.iloc[10].GID_0)
print(gdf.iloc[10].COUNTRY)
print(gdf.iloc[10].name)
# To print the column names use the `columns` method.
print(gdf.columns.values)
# Select a row (= country) by its name, therefore, we use the column COUNTRY
# for the condition.
print(gdf.loc[gdf['COUNTRY'] == 'Germany'])
# Bounds
#
# Print the bounds of the first 5 countries.
print(gdf.bounds.head())
# Print bounds range of the shapefile content.
print(f'minx: {min(gdf.bounds.minx):8.3f} maxx: {max(gdf.bounds.maxx):8.3f}')
print(f'miny: {min(gdf.bounds.miny):8.3f} maxy: {max(gdf.bounds.maxy):8.3f}')
# Plot shapefile content
#
# The GeoPandas package provides an interface to Matplotlib for drawing.
# Therefore, we use the GeoPandas function `.plot()` to plot the content of
# the GeoPandas data frame. See also https://geopandas.org/en/stable/docs/user_guide/mapping.html.
gdf.plot()
plt.show()
# The plot above shows the drawings of the multi-polygons of the countries
# (default). If you want to draw only the outlines, the border lines of the
# countries, you can use the `.boundary` method with the `.plot()` method.
gdf.boundary.plot(linewidth=0.5)
plt.show()
# Draw each country using the columns 'GID_0' or 'COUNTRY'. To add a legend
# to the plot the parameter column has to be given to the plot call. GID_0 for
# the country abbreviation or COUNTRY for the standard name.
legend_kw = dict(loc='center left', bbox_to_anchor=(1,0.5), fontsize=6, ncols=3)
gdf.plot(column='COUNTRY', cmap='gist_rainbow', figsize=(12, 5),
legend=True, legend_kwds=legend_kw) # standard names
#gdf.plot(column='GID_0', cmap='gist_rainbow', figsize=(12, 5),
# legend=True, legend_kwds=legend_kw) # abbreviations
plt.show()
# Extract countries
#
# We can extract a single country or a set of countries as shown above with
# the `loc` and `iloc` methods, but there is a third method called `where`
# that can do it as well.
#
# Using `where` method
gdf.where(gdf['COUNTRY'] == 'Germany' ).plot()
plt.show()
# Here, we use the `loc` method of the GeoPandas DataFrame to extract the
# geometries of Germany, and assign it to a variable.
D = gdf.loc[gdf['COUNTRY'] == 'Germany']
print(D)
# Extract multiple countries using `query()` method
#
# The `query()` method allows us to pass a list of country names for the
# selection.
country_list = ['Germany', 'France']
countries = gdf.query('COUNTRY in @country_list')
print(countries)
countries.plot()
plt.show()
# Print the index
cindex = countries.index.values
print(cindex)
# Print the multi-polygon geometries
print(countries.geometry)
# Get the bounds of the two countries, the minimum and maximum of the
# multipolygon coordinate data.
print(countries.bounds)
# Print the polyline data of the multi-polygon.
print(countries.boundary)
countries.boundary.plot(lw=0.5)
plt.show()
# Shapefile projection
#
# In most cases the geometries of a shapefile based on a projection. We can
# get the stored projection with the `crs` property.
print(countries.crs)
# Plotting with Matplotlib
#
# Sometimes you'll want to have more controll on the plot creation and that
# is why we here combine the plot functionalities of GeoPandas, Matplotlib,
# and Cartopy.
#
# First, we select each geometry for the two countries Germany and France.
# We do this in order to have better control over the individual plots of the
# countries, e.g. the choice of colors, hatching or the respective labels for
# the legend.
D = gdf.loc[gdf['COUNTRY'] == 'Germany']
F = gdf.loc[gdf['COUNTRY'] == 'France']
# In the next step, we
#
# - draw the two countries on a map,
# - zoom into the map,
# - add a legend to the right side of the plot
proj = ccrs.PlateCarree()
fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
layout='tight')
ax.set_extent([-6., 16., 40., 57.])
ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
style_kwds = dict(color='gold', edgecolor='red', hatch='///', linewidth=0.6,)
D.plot(ax=ax, **style_kwds, transform=ccrs.PlateCarree())
style_kwds = dict(color='blue', edgecolor='red', linewidth=0.6,)
F.plot(ax=ax, **style_kwds, transform=ccrs.PlateCarree())
legend_labels = ['Germany (D)', 'France (F)']
legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor="gold")
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor="blue")
legend_marker = (legend_item1, legend_item2)
plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)
ax.add_feature(cfeature.LAND.with_scale('50m'), color='whitesmoke', zorder=0)
ax.add_feature(cfeature.OCEAN.with_scale('50m'), color='lightgray', zorder=0);
plt.show()
# Re-project to Orthographic projection
#
# To plot the pre-projected shapefile content correctly, we have to re-project
# it to the target projection, here Orthographic. Therefore, we can use the
# `to_crs()` method of the GeoPandas GeoDataFrame with the PROJ string from
# Cartopy's CRS of the target projection.
#
# Target projection
proj = ccrs.Orthographic(central_longitude=0., central_latitude=52.)
# Generate PROJ string from Cartopy CRS
proj_string = proj.proj4_init
print(proj_string)
# Re-project geometries of the GeoPandas GeoDataFrames D and F.
D_ortho = D.to_crs(proj_string)
F_ortho = F.to_crs(proj_string)
# **Note:**
# Due to the fact that we re-project the shapefile geometries for the selected
# countries, we do not need to use the transform parameter in the plot call.
#
# In the next step, we
#
# - draw the two countries on a map with Orthographic projection,
# - zoom into the map,
# - choose different colors and hatching,
# - add a legend to the right side of the plot
start_time = datetime.now()
fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
layout='tight')
ax.set_extent([-30., 40., 20., 75.])
ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax.coastlines(resolution='50m', linewidth=0.6, edgecolor='black')
style_kw = dict(color='gold', edgecolor='red', hatch='///', linewidth=0.6,)
D_ortho.plot(ax=ax, **style_kw)
style_kw = dict(color='blue', edgecolor='red', linewidth=0.6,)
F_ortho.plot(ax=ax, **style_kw)
legend_labels = ['Germany (D)', 'France (F)']
legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor='gold')
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor='blue')
legend_marker = (legend_item1, legend_item2)
plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)
print(f'run time: {datetime.now() - start_time}\n')
plt.show()
# Make it faster with Cartopy's `ax.add_geometries()`
#
# A faster way to generate the plot of the map with the shapefile geometries
# is to use Cartopy's `ax.add_geometries()` method instead of geoDataFrame.plot().
start_time = datetime.now()
fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
layout='tight')
ax.set_extent([-30., 40., 20., 75.])
ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax.coastlines(resolution='50m', linewidth=0.6, edgecolor='black')
style_kw = dict(facecolor='gold', edgecolor='red', hatch='///', linewidth=0.6,)
ax.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)
style_kw = dict(facecolor='blue', edgecolor='red', linewidth=0.6,)
ax.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)
legend_labels = ['Germany (D)', 'France (F)']
legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor='gold')
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor='blue')
legend_marker = (legend_item1, legend_item2)
plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)
print(f'run time: {datetime.now() - start_time}\n')
plt.show()
# Overlay, draw borderlines over data
#
# For the sake of simplicity, we use an example data set provided by the
# **Xarray** package. See also https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html
#
# The data set we use here contains the variable **rasm** which is the output
# of the Regional Arctic System Model (RASM).
#
# **Note:**
# The variable **Tair** has the coordinates (time,y,x) where x and y are the
# logical coordinates that point to the physical coordinate variables xc and yc.
# The physical coordinates xc,yc are 2-dimensional arrays.
import xarray as xr
ds = xr.tutorial.open_dataset('rasm')
print(ds)
# Select the first time step
var = ds.Tair.isel(time=0)
# Let's have a quick look at the data. Using the plot() method without further
# parameter settings, the plot routine will use the logical coordinates.
var.plot()
plt.show()
# Now, we can use the already known plotting content from above and add the
# data variable using the physical coordinates.
#
# You can use Xarray's or Matplotlib's plot methods. Under the hood, Xarray
# uses the plot functionality of Matplotlib.
#
# In the following we only need the plot commands from above plus a plot
# command for the variable Tair. As there are two ways of plotting the data,
# we create two plots next to each other so that we can compare them more easily.
start_time = datetime.now()
fig, (ax1,ax2) = plt.subplots(nrows=1, ncols=2, figsize=(8,8),
subplot_kw={'projection': proj},
layout='tight')
#-- use Xarray's plot method (uses Matplotlib under the hood)
ax1.set_extent([-30., 40., 20., 75.])
ax1.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax1.coastlines(resolution='50m', linewidth=0.6, color='blue')
cn1 = var.plot.pcolormesh(ax=ax1, x='xc', y='yc',
cmap='RdBu_r',
add_colorbar=False,
transform=ccrs.PlateCarree())
ax1.set_title('')
#-- use Matplotlib's plot method
ax2.set_extent([-30., 40., 20., 75.])
ax2.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax2.coastlines(resolution='50m', linewidth=0.6, color='blue')
cn2 = ax2.pcolormesh(ds.xc, ds.yc, var,
cmap='RdBu_r',
transform=ccrs.PlateCarree())
#-- add geometries to both plots
style_kw = dict(facecolor='None', edgecolor='black', linewidth=0.5)
ax1.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)
ax1.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)
ax2.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)
ax2.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)
print(f'run time: {datetime.now() - start_time}\n')
plt.show()
# **Conclusion:** with GeoPandas, shapefiles can be used in a very simple way.
Example plot of European country border lines