Scatter plot with custom labels and annotations

logo of a chart:ScatterPlot

In this post, we'll learn how to create a scatter plot with a custom colormap, specific annotations, and highlighted countries. We'll use the Global Ecological Footprint dataset to visualize the ecological footprint and biocapacity of countries.

This step-by-step guide starts with the most simple version of the scatter plot and iteratively adds features until it results in a real-life, beautiful example of data visualization made with matplotlib.

About

This plot is a scatter plot showing the relationship between the footprint and the biocapacity of different countries.

It was made by Joseph Barbier. Thanks to him for accepting sharing his work here!

Let's see what the final picture will look like:

scatter plot with annotations

Libraries

First, we need to install the following librairies:

# Libraries
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from highlight_text import fig_text
from adjustText import adjust_text
from matplotlib.font_manager import FontProperties

Dataset

For this reproduction, we're going to retrieve the data directly from the gallery's Github repo. This means we just need to give the right url as an argument to pandas' read_csv() function to retrieve the data.

url = 'https://raw.githubusercontent.com/holtzy/the-python-graph-gallery/master/static/data/footprint.csv'
df = pd.read_csv(url)
df = df[df['biocapacity']< 60]
df.head()
lifexp country region gdpCapita populationMillions footprint biocapacity ecoReserve earthsRequired ratio
0 76.0 Albania Other Europe 14889.0 2.9 2.1 1.176752 -0.894486 1.371485 1.784573
1 62.0 Angola Africa 6304.0 35.0 0.9 1.588191 0.730346 0.568029 0.566682
2 75.0 Argentina South America 22117.0 46.0 3.2 5.231663 2.011045 2.132556 0.611660
3 83.0 Australia Asia-Pacific 53053.0 26.1 5.8 11.021401 5.244362 3.825307 0.526249
4 81.0 Austria EU-27 55460.0 9.1 5.6 2.893775 -2.732866 3.725721 1.935189

Basic scatter plot

The core of the chart is simply based on the scatter() funtion from matplotlib. You can learn more about it in the dedicated section of the gallery.

# parameters
x_col = 'biocapacity'
y_col = 'footprint'
c = 'earthsRequired'

fig, ax = plt.subplots(figsize=(10, 8), dpi=500)

ax.scatter(
   df[x_col],
   df[y_col],
   c=df[c],
   s=150,
   edgecolor='white',
   alpha=0.7,
   marker='s',
   linewidth=0.4
)

plt.show()

Custom colors

We first want to create a cmap that will be used with the earthsRequired column. This column will be used to color the points of the scatter plot. Thanks to its LinearSegmentedColormap class, the matplotlib.colors module allows us to create a custom colormap. We can then use it with the scatter() function by setting the c argument to the column we want to use for the colors.

def create_gradient_colormap(colors):
   cmap = LinearSegmentedColormap.from_list("custom_gradient", colors, N=256)
   return cmap

colors = ['#136f63', '#c1121f']
cmap = create_gradient_colormap(colors)
cmap

Once we have our colormap, we define other colors that will be used for the annotations and the spines of the plot.

# parameters
background_color = '#333533'
text_color = 'white'
light_text_color = '#adb5bd'
x_col = 'biocapacity'
y_col = 'footprint'
c = 'earthsRequired'

fig, ax = plt.subplots(figsize=(10, 8), dpi=500)
fig.set_facecolor(background_color)
ax.set_facecolor(background_color)
ax.spines[['top', 'right']].set_visible(False)
ax.spines['bottom'].set_color(light_text_color)
ax.spines['left'].set_color(light_text_color)
ax.tick_params(axis='x', colors=light_text_color)
ax.tick_params(axis='y', colors=light_text_color)

ax.scatter(
   df[x_col],
   df[y_col],
   c=df[c],
   s=150,
   cmap=cmap,
   edgecolor='white',
   alpha=0.7,
   marker='s',
   linewidth=0.4
)

plt.show()

Title and labels

Now that we have the main components of the plot, we can add simple annotations. This mainly relies on the fig_text() from the highlight_text package. You can learn more about it in the gallery.

def create_gradient_colormap(colors):
   cmap = LinearSegmentedColormap.from_list("custom_gradient", colors, N=256)
   return cmap

# font
personal_path = '/Users/josephbarbier/Library/Fonts/'
font_path = personal_path + 'FiraSans-Light.ttf'
font = FontProperties(fname=font_path)
bold_font = FontProperties(fname=personal_path + 'FiraSans-Medium.ttf')

# parameters
colors = ['#136f63', '#c1121f']
cmap = create_gradient_colormap(colors)
background_color = '#333533'
text_color = 'white'
light_text_color = '#adb5bd'
x_col = 'biocapacity'
y_col = 'footprint'
c = 'earthsRequired'

fig, ax = plt.subplots(figsize=(10, 8), dpi=500)
fig.set_facecolor(background_color)
ax.set_facecolor(background_color)
ax.spines[['top', 'right']].set_visible(False)
ax.spines['bottom'].set_color(light_text_color)
ax.spines['left'].set_color(light_text_color)
ax.tick_params(axis='x', colors=light_text_color)
ax.tick_params(axis='y', colors=light_text_color)

ax.scatter(
   df[x_col],
   df[y_col],
   c=df[c],
   s=150,
   cmap=cmap,
   edgecolor='white',
   alpha=0.7,
   marker='s',
   linewidth=0.4
)

# title
qatar_color = cmap(df.loc[df['country'] == 'Qatar', c].values[0]/df[c].max())
luxembourg_color = cmap(df.loc[df['country'] == 'Luxembourg', c].values[0]/df[c].max())
qatar_earths = df.loc[df['country'] == 'Qatar', 'earthsRequired'].values[0]
luxembourg_earths = df.loc[df['country'] == 'Luxembourg', 'earthsRequired'].values[0]
text = f"""
<Qatar> and <Luxembourg> have the highest ecological\n<footprint>, in spite of having one the lowest <biocapacity>
"""
fig_text(
   0.26,
   0.9,
   text,
   fontsize=18,
   color=text_color,
   font=font,
   ha='left',
   va='top',
   highlight_textprops=[
      {'font':bold_font},
      {'font':bold_font},
      {'font':bold_font, 'color':colors[1]},
      {'font':bold_font, 'color':colors[0]}
   ],
   ax=ax
)
text = f"""
If everyone lived like Qatar, we would need <{qatar_earths:.1f} Earths> to sustain the world.
"""
fig_text(
   0.26,
   0.82,
   text,
   fontsize=12,
   color=text_color,
   font=font,
   ha='left',
   va='top',
   highlight_textprops=[
      {'font':bold_font}
   ],
   ax=ax
)

# credit and about
text = """
Global Ecological Footprint 2023. Footprint and Biocapacity are in global hectares per person.
<Visualization:> barbierjoseph.com
"""
fig_text(
   0.12,
   0.05,
   text,
   fontsize=7,
   font=font,
   color=text_color,
   ha='left',
   va='top',
   highlight_textprops=[
      {'font':bold_font}
   ],
   ax=ax
)

# label for axis
fig_text(
   0.85, 0.08,
   'Biocapacity'.upper(),
   fontsize=12,
   font=bold_font,
   color=colors[0],
   ha='center',
   va='top',
   ax=ax
)
fig_text(
   0.1, 0.9,
   'Footprint'.upper(),
   fontsize=12,
   font=bold_font,
   color=colors[1],
   ha='left',
   va='center',
   ax=ax
)

plt.show()