Python shapefile basics#
Description
The following example gives a brief overview of how to easily read in a shapefile using the GeoPandas software package and use it for visualization.
The used shapefile from data.dtu.dk contains the border lines of the European countries.
Learning steps:
read the data into a GeoPandas DataFrame
- have a closer look at the content
country (DataFrame row), abbreviation and name (DataFrame columns)
country area (DataFrame geometry, polygons)
country borderline (DataFrame geometry, polylines)
extract the data for a single country
extract the data for multiple countries
plot geometries
- combine GeoPandas plotting routine with Matplotlib and Cartopy
draw country borderline over data (overlay) on a map
Software requirements
Python 3
geopandas
matplotlib
cartopy
xarray
Notebook shapefile_basics.ipynb
Example script#
shapefile_basics.py
#!/usr/bin/env python
# coding: utf-8
'''
DKRZ Tutorial
Shapefiles
The following example gives a brief overview of how to easily read in a
shapefile using the GeoPandas software package and use it for visualization.
The used shapefile from data.dtu.dk contains the border lines of the
European countries.
Learning steps
- read the data into a GeoPandas DataFrame
- have a closer look at the content
- country (DataFrame row), abbreviation and name (DataFrame columns)
- country area (DataFrame geometry, polygons)
- country borderline (DataFrame geometry, polylines)
- extract the data for a single country
- extract the data for multiple countries
- plot geometries
- combine GeoPandas plotting routine with Matplotlib and Cartopy
- draw country borderline on data (overlay)
GeoPandas
URL: https://geopandas.org/en/stable/
Shapefile description
From the ESRI Shapefile Technical Description
https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf
A shapefile stores nontopological geometry and attribute information for
the spatial features in a data set. The geometry for a feature is stored
as a shape comprising a set of vector coordinates.
Because shapefiles do not have the processing overhead of a topological
data structure, they have advantages over other data sources such as faster
drawing speed and edit ability. Shapefiles handle single features that
overlap or that are noncontiguous. They also typically require less disk
space and are easier to read and write.
Shapefiles can support point, line, and area features. Area features are
represented as closed loop, double-digitized polygons. Attributes are held
in a dBASE® format file.
Each attribute record has a one-to-one relationship with the associated
shape record.
Download shapefile
Shapefile download page:
https://data.dtu.dk/articles/dataset/Shapefile_of_European_countries/23686383
Used shapefile data set:
https://data.dtu.dk/ndownloader/articles/23686383/versions/1
Sevdari, Kristian; Marmullaku, Drin (2023). Shapefile of European countries.
Technical University of Denmark. Dataset. https://doi.org/10.11583/DTU.23686383.v1
-------------------------------------------------------------------------------
2024 copyright DKRZ licensed under CC BY-NC-SA 4.0
(https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)
-------------------------------------------------------------------------------
'''
from datetime import datetime
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import cartopy.crs as ccrs
import cartopy.feature as cfeature
def main():
plt.switch_backend('agg')
# Load the shapefile
#
# 1. Download the shapefile from https://data.dtu.dk/ndownloader/articles/23686383/versions/1
#
# 2. Load the shapefile with GeoPandas `read_file()` function.
shapefile_path = '../../data/Europe_merged.shp'
gdf = gpd.read_file(shapefile_path)
# Shapefile content
#
# Print the first 5 lines of the shapefile content using the `.head()` method.
print(gdf.head())
# Shape, rows, and columns
#
# The readonly property `shape` gives us the (nrows, ncols) size of the
# DataFrame.
print(gdf.shape)
# Number of rows
nrows = gdf.shape[0]
print(nrows)
# same as above
nrows = len(gdf.index)
print(nrows)
# Number of columns
ncols = gdf.shape[1]
print(ncols)
# same as above
ncols = len(gdf.columns)
print(ncols)
# Each row in the GeoPandas DataFrame contains the data of one country.
#
# Select a row by index
print(gdf.iloc[10])
print('----------------------')
print(gdf.iloc[10].GID_0)
print(gdf.iloc[10].COUNTRY)
print(gdf.iloc[10].name)
# To print the column names use the `columns` method.
print(gdf.columns.values)
# Select a row (= country) by its name, therefore, we use the column COUNTRY
# for the condition.
print(gdf.loc[gdf['COUNTRY'] == 'Germany'])
# Bounds
#
# Print the bounds of the first 5 countries.
print(gdf.bounds.head())
# Print bounds range of the shapefile content.
print(f'minx: {min(gdf.bounds.minx):8.3f} maxx: {max(gdf.bounds.maxx):8.3f}')
print(f'miny: {min(gdf.bounds.miny):8.3f} maxy: {max(gdf.bounds.maxy):8.3f}')
# Plot shapefile content
#
# The GeoPandas package provides an interface to Matplotlib for drawing.
# Therefore, we use the GeoPandas function `.plot()` to plot the content of
# the GeoPandas data frame. See also https://geopandas.org/en/stable/docs/user_guide/mapping.html.
gdf.plot()
plt.savefig('plot_shp_1.png', bbox_inches='tight', facecolor='white')
# The plot above shows the drawings of the multi-polygons of the countries
# (default). If you want to draw only the outlines, the border lines of the
# countries, you can use the `.boundary` method with the `.plot()` method.
gdf.boundary.plot(linewidth=0.5)
plt.savefig('plot_shp_2.png', bbox_inches='tight', facecolor='white')
# Draw each country using the columns 'GID_0' or 'COUNTRY'. To add a legend
# to the plot the parameter column has to be given to the plot call. GID_0 for
# the country abbreviation or COUNTRY for the standard name.
legend_kw = dict(loc='center left', bbox_to_anchor=(1,0.5), fontsize=6, ncols=3)
gdf.plot(column='COUNTRY', cmap='gist_rainbow', figsize=(12, 5),
legend=True, legend_kwds=legend_kw) # standard names
#gdf.plot(column='GID_0', cmap='gist_rainbow', figsize=(12, 5),
# legend=True, legend_kwds=legend_kw) # abbreviations
plt.savefig('plot_shp_3.png', bbox_inches='tight', facecolor='white')
# Extract countries
#
# We can extract a single country or a set of countries as shown above with
# the `loc` and `iloc` methods, but there is a third method called `where`
# that can do it as well.
#
# Using `where` method
gdf.where(gdf['COUNTRY'] == 'Germany' ).plot()
plt.savefig('plot_shp_4.png', bbox_inches='tight', facecolor='white')
# Here, we use the `loc` method of the GeoPandas DataFrame to extract the
# geometries of Germany, and assign it to a variable.
D = gdf.loc[gdf['COUNTRY'] == 'Germany']
print(D)
# Extract multiple countries using `query()` method
#
# The `query()` method allows us to pass a list of country names for the
# selection.
country_list = ['Germany', 'France']
countries = gdf.query('COUNTRY in @country_list')
print(countries)
countries.plot()
plt.savefig('plot_shp_5.png', bbox_inches='tight', facecolor='white')
# Print the index
cindex = countries.index.values
print(cindex)
# Print the multi-polygon geometries
print(countries.geometry)
# Get the bounds of the two countries, the minimum and maximum of the
# multipolygon coordinate data.
print(countries.bounds)
# Print the polyline data of the multi-polygon.
print(countries.boundary)
countries.boundary.plot(lw=0.5)
plt.savefig('plot_shp_6.png', bbox_inches='tight', facecolor='white')
# Shapefile projection
#
# In most cases the geometries of a shapefile based on a projection. We can
# get the stored projection with the `crs` property.
print(countries.crs)
# Plotting with Matplotlib
#
# Sometimes you'll want to have more controll on the plot creation and that
# is why we here combine the plot functionalities of GeoPandas, Matplotlib,
# and Cartopy.
#
# First, we select each geometry for the two countries Germany and France.
# We do this in order to have better control over the individual plots of the
# countries, e.g. the choice of colors, hatching or the respective labels for
# the legend.
D = gdf.loc[gdf['COUNTRY'] == 'Germany']
F = gdf.loc[gdf['COUNTRY'] == 'France']
# In the next step, we
#
# - draw the two countries on a map,
# - zoom into the map,
# - add a legend to the right side of the plot
proj = ccrs.PlateCarree()
fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
layout='tight')
ax.set_extent([-6., 16., 40., 57.])
ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
style_kwds = dict(color='gold', edgecolor='red', hatch='///', linewidth=0.6,)
D.plot(ax=ax, **style_kwds, transform=ccrs.PlateCarree())
style_kwds = dict(color='blue', edgecolor='red', linewidth=0.6,)
F.plot(ax=ax, **style_kwds, transform=ccrs.PlateCarree())
legend_labels = ['Germany (D)', 'France (F)']
legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor="gold")
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor="blue")
legend_marker = (legend_item1, legend_item2)
plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)
ax.add_feature(cfeature.LAND.with_scale('50m'), color='whitesmoke', zorder=0)
ax.add_feature(cfeature.OCEAN.with_scale('50m'), color='lightgray', zorder=0);
plt.savefig('plot_shp_7.png', bbox_inches='tight', facecolor='white')
# Re-project to Orthographic projection
#
# To plot the pre-projected shapefile content correctly, we have to re-project
# it to the target projection, here Orthographic. Therefore, we can use the
# `to_crs()` method of the GeoPandas GeoDataFrame with the PROJ string from
# Cartopy's CRS of the target projection.
#
# Target projection
proj = ccrs.Orthographic(central_longitude=0., central_latitude=52.)
# Generate PROJ string from Cartopy CRS
proj_string = proj.proj4_init
print(proj_string)
# Re-project geometries of the GeoPandas GeoDataFrames D and F.
D_ortho = D.to_crs(proj_string)
F_ortho = F.to_crs(proj_string)
# **Note:**
# Due to the fact that we re-project the shapefile geometries for the selected
# countries, we do not need to use the transform parameter in the plot call.
#
# In the next step, we
#
# - draw the two countries on a map with Orthographic projection,
# - zoom into the map,
# - choose different colors and hatching,
# - add a legend to the right side of the plot
start_time = datetime.now()
fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
layout='tight')
ax.set_extent([-30., 40., 20., 75.])
ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax.coastlines(resolution='50m', linewidth=0.6, edgecolor='black')
style_kw = dict(color='gold', edgecolor='red', hatch='///', linewidth=0.6,)
D_ortho.plot(ax=ax, **style_kw)
style_kw = dict(color='blue', edgecolor='red', linewidth=0.6,)
F_ortho.plot(ax=ax, **style_kw)
legend_labels = ['Germany (D)', 'France (F)']
legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor='gold')
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor='blue')
legend_marker = (legend_item1, legend_item2)
plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)
print(f'run time: {datetime.now() - start_time}\n')
plt.savefig('plot_shp_8.png', bbox_inches='tight', facecolor='white')
# Make it faster with Cartopy's `ax.add_geometries()`
#
# A faster way to generate the plot of the map with the shapefile geometries
# is to use Cartopy's `ax.add_geometries()` method instead of geoDataFrame.plot().
start_time = datetime.now()
fig, ax = plt.subplots(figsize=(6,4), subplot_kw={'projection': proj},
layout='tight')
ax.set_extent([-30., 40., 20., 75.])
ax.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax.coastlines(resolution='50m', linewidth=0.6, edgecolor='black')
style_kw = dict(facecolor='gold', edgecolor='red', hatch='///', linewidth=0.6,)
ax.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)
style_kw = dict(facecolor='blue', edgecolor='red', linewidth=0.6,)
ax.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)
legend_labels = ['Germany (D)', 'France (F)']
legend_item1 = Line2D([], [], linestyle='none', color='gold', marker='o', markerfacecolor='gold')
legend_item2 = Line2D([], [], linestyle='none', color='blue', marker='o', markerfacecolor='blue')
legend_marker = (legend_item1, legend_item2)
plt.legend(handles=legend_marker, labels=legend_labels, loc=(1.1, 0.92), fontsize=8)
print(f'run time: {datetime.now() - start_time}\n')
plt.savefig('plot_shp_9.png', bbox_inches='tight', facecolor='white')
# Overlay, draw borderlines over data
#
# For the sake of simplicity, we use an example data set provided by the
# **Xarray** package. See also https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html
#
# The data set we use here contains the variable **rasm** which is the output
# of the Regional Arctic System Model (RASM).
#
# **Note:**
# The variable **Tair** has the coordinates (time,y,x) where x and y are the
# logical coordinates that point to the physical coordinate variables xc and yc.
# The physical coordinates xc,yc are 2-dimensional arrays.
import xarray as xr
ds = xr.tutorial.open_dataset('rasm')
print(ds)
# Select the first time step
var = ds.Tair.isel(time=0)
# Let's have a quick look at the data. Using the plot() method without further
# parameter settings, the plot routine will use the logical coordinates.
var.plot()
plt.savefig('plot_shp_10.png', bbox_inches='tight', facecolor='white')
# Now, we can use the already known plotting content from above and add the
# data variable using the physical coordinates.
#
# You can use Xarray's or Matplotlib's plot methods. Under the hood, Xarray
# uses the plot functionality of Matplotlib.
#
# In the following we only need the plot commands from above plus a plot
# command for the variable Tair. As there are two ways of plotting the data,
# we create two plots next to each other so that we can compare them more easily.
start_time = datetime.now()
fig, (ax1,ax2) = plt.subplots(nrows=1, ncols=2, figsize=(8,8),
subplot_kw={'projection': proj},
layout='tight')
#-- use Xarray's plot method (uses Matplotlib under the hood)
ax1.set_extent([-30., 40., 20., 75.])
ax1.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax1.coastlines(resolution='50m', linewidth=0.6, color='blue')
cn1 = var.plot.pcolormesh(ax=ax1, x='xc', y='yc',
cmap='RdBu_r',
add_colorbar=False,
transform=ccrs.PlateCarree())
ax1.set_title('')
#-- use Matplotlib's plot method
ax2.set_extent([-30., 40., 20., 75.])
ax2.gridlines(draw_labels=True, color='gray', alpha=0.8, linestyle='--')
ax2.coastlines(resolution='50m', linewidth=0.6, color='blue')
cn2 = ax2.pcolormesh(ds.xc, ds.yc, var,
cmap='RdBu_r',
transform=ccrs.PlateCarree())
#-- add geometries to both plots
style_kw = dict(facecolor='None', edgecolor='black', linewidth=0.5)
ax1.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)
ax1.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)
ax2.add_geometries(D_ortho["geometry"], crs=proj, **style_kw)
ax2.add_geometries(F_ortho["geometry"], crs=proj, **style_kw,)
print(f'run time: {datetime.now() - start_time}\n')
plt.savefig('plot_shp_11.png', bbox_inches='tight', facecolor='white')
# **Conclusion:** with GeoPandas, shapefiles can be used in a very simple way.
if __name__ == '__main__':
main()