Stacked area chart with inline labels and arrows

logo of a chart:StackedArea

Stacked area charts are an excellent way to display the evolution of a variable across different categories.

This post explains how to combine this type of chart with a custom color palette, detailed annotations, inline labels, and arrows with an inflection point. The process is described step-by-step, starting from a basic example to the final chart with reproducible code.

About

Stacked area chart is a graphical representation of data that shows the composition of a variable over time. The area between the x-axis and the lines is filled with colors to represent different categories of data.

The following example shows the evolution of natural disasters over the years by type of disaster.

This chart has been created by Joseph Barbier. Thanks to him for accepting sharing its work here!

As a teaser, here is the plot we’re gonna try building:

stacked area chart

Libraries

First, we need to install the following libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
from pypalettes import get_hex
from highlight_text import fig_text, ax_text

Dataset

The type of data needed when creating a stacked area chart is a time series.

Specifically, our dataset needs a column for the time variable (usually the x-axis) and a column for each category we want to represent (usually the y-axis). In this case, we have one column per disaster type.

url = 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/disaster-events.csv'
df = pd.read_csv(url)

def remove_agg_rows(entity: str):
   if entity.lower().startswith('all disasters'):
      return False
   else:
      return True
df = df.replace('Dry mass movement', 'Drought')
df = df[df['Entity'].apply(remove_agg_rows)]
df = df[~df['Entity'].isin(['Fog', 'Glacial lake outburst flood'])]
df = df.pivot_table(index='Entity', columns='Year', values='Disasters').T
df.loc[1900, :] = df.loc[1900, :].fillna(0)
df = df[df.index >= 1960]
df = df[df.index <= 2023]
df = df.interpolate(axis=1)
df.head()
Entity Drought Earthquake Extreme temperature Extreme weather Flood Volcanic activity Wet mass movement Wildfire
Year
1960 1.0 8.0 14.0 20.0 8.0 1.0 2.0 2.0
1961 1.0 3.0 1.0 14.0 9.0 5.5 2.0 2.0
1962 1.0 4.0 1.0 13.0 8.0 5.0 2.0 2.0
1963 1.0 3.0 2.0 21.0 8.0 3.0 2.0 2.0
1964 8.0 7.0 14.5 22.0 22.0 1.0 1.0 1.0

Simple stacked area

This first version of the plot is made via the ax.stackplot() function from matplotlib. It is the simplest way to create a stacked area chart

# initialize the figure
fig, ax = plt.subplots(figsize=(14,7), dpi=300)

# define the x-axis variable and order the columns
columns = df.sum().sort_values().index.to_list()
x = df.index

# create the stacked area plot
areas = np.stack(df[columns].values, axis=-1)
ax.stackplot(x, areas)

# display the plot
plt.show()

Custom axes

Since default axes are not very attractive, we start by removing them with the ax.set_axis_off() function.

The x and y labels will be displayed using the highlight_text package, which simplifies the process of adding text annotations to a plot.

In practice, we use for loops to add the labels to the plot with the desired values.

# initialize the figure
fig, ax = plt.subplots(figsize=(14,7), dpi=300)
ax.set_axis_off()

# define the x-axis variable and order the columns
columns = df.sum().sort_values().index.to_list()
x = df.index

# create the stacked area plot
areas = np.stack(df[columns].values, axis=-1)
ax.stackplot(x, areas)

# add label for the x-axis
for year in range(1960, 2030, 10):
   ax_text(
      x=year, y=-10, s=f'{year}',
      va='top', ha='left',
      fontsize=13,
      color='grey'
   )

# add label for the y-axis
for value in range(100, 400, 100):
   ax_text(
      x=1960, y=value, s=f'{value}',
      va='center', ha='left',
      fontsize=13,
      color='grey'
   )

# display the plot
plt.show()