Libraries
For creating this chart, we will need to load the following libraries:
- pandas for data manipulation
- matplotlib for creatin the chart
numpy
for smoothing the chart
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Dataset
The dataset can be accessed using the url below.
It contains data about x-men and the number of times they appeared in the comics between the 60's and the 90's.
url = 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/mutant_moneyball.csv'
df = pd.read_csv(url)
Data cleaning
In order to make the data ready for the chart, we will need to clean it a bit.
Name of the x-men
Since the name of the x-men is written in camelCase, we will need to split it into two words. For example, scottSummers will become Scott Summers.
def format_name(s):
if " " in s:
return s
formatted_string = ""
for i, char in enumerate(s):
if char.isupper() and i != 0:
formatted_string += " " + char
else:
formatted_string += char
if formatted_string:
formatted_string = formatted_string[0].upper() + formatted_string[1:]
return formatted_string
df['Member'] = df['Member'].apply(format_name)
df = df[['Member', 'TotalIssues60s', 'TotalIssues70s',
'TotalIssues80s', 'TotalIssues90s']]
df.set_index('Member', inplace=True)
Transpose the data
The data is currently in a wide format, which means that one row represent one x-men and the columns represent the value for eachd decade.
We will need to transpose it to a long format, where each row represents a decade and the columns represent the x-men.
# transpose the dataframe
df_transposed = df.T
decades = ['1960s', '1970s', '1980s', '1990s'] # values of the x-axis
members = df_transposed.columns # name of the x-mens for the legend
issues_list = df_transposed.T.values.tolist() # values of the x-men
df_transposed
Member | Warren Worthington | Hank Mc Coy | Scott Summers | Bobby Drake | Jean Grey | Alex Summers | Lorna Dane | Ororo Munroe | Kurt Wagner | Logan Howlett | ... | Rachel Summers | Eric Magnus | Alison Blaire | Longshot | Jonathan Silvercloud | Remy Le Beau | Jubilation Lee | Lucas Bishop | Betsy Braddock | Charles Xavier |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TotalIssues60s | 61 | 62 | 63 | 62 | 63 | 8 | 9 | 0 | 0 | 0 | ... | 0 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 39 |
TotalIssues70s | 35 | 38 | 69 | 35 | 58 | 13 | 13 | 36 | 36 | 36 | ... | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 46 |
TotalIssues80s | 20 | 9 | 56 | 6 | 14 | 43 | 19 | 121 | 84 | 115 | ... | 23 | 18 | 43 | 35 | 11 | 0 | 6 | 0 | 45 | 61 |
TotalIssues90s | 23 | 10 | 9 | 20 | 29 | 4 | 7 | 33 | 0 | 16 | ... | 1 | 3 | 2 | 0 | 28 | 17 | 17 | 16 | 14 | 23 |
4 rows × 26 columns
Simple streamgraph
Thanks to the stackplot()
function from matplotlib, it is possible to create a simple streamgraph.
decades = ['1960s', '1970s', '1980s', '1990s'] # values of the x-axis
members = df_transposed.columns # name of the x-mens for the legend
issues_list = df_transposed.T.values.tolist() # values of the x-men
fig, ax = plt.subplots(figsize=(8, 6))
ax.stackplot(decades, issues_list, labels=members)
ax.set_title('Evolution of Total Issues per X-Men Member per Decade (60s-90s)')
ax.set_ylabel('Total Issues')
ax.set_xlabel('Decade')
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
fig.tight_layout()
plt.show()
Custom colors
The color we will use is based on the value of the total number of appearances, which means that we have to compute it first.
Then, we create a list of colors using the cm
module from matplotlib
.
decades = ['1960s', '1970s', '1980s', '1990s'] # values of the x-axis
members = df_transposed.columns # name of the x-mens for the legend
issues_list = df_transposed.T.values.tolist() # values of the x-men
# calculate the normalized totals to generate the colors
total_issues_per_member = np.sum(issues_list, axis=1)
normalized_totals = total_issues_per_member / np.max(total_issues_per_member)
cmap = plt.cm.Reds
colors = cmap(normalized_totals)
fig, ax = plt.subplots(figsize=(8, 6))
ax.stackplot(decades, issues_list, labels=members, colors=colors)
ax.set_title('Evolution of Total Issues per X-Men Member per Decade (60s-90s)')
ax.set_ylabel('Total Issues')
ax.set_xlabel('Decade')
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
fig.tight_layout()
plt.show()
Custom order
If we want to change the order in which x-men are displayed, we can use the argsort()
function from numpy
.
It will gives us the index of the x-men, sorted by the total number of appearances. Then, we can use this index to reorder the list of values, name and colors.
# calculate the normalized totals
total_issues_per_member = np.sum(issues_list, axis=1)
normalized_totals = total_issues_per_member / np.max(total_issues_per_member)
cmap = plt.cm.Reds
colors = cmap(normalized_totals)
# sort the members by total issues
sorted_indices = np.argsort(total_issues_per_member)
sorted_issues_list = np.array(issues_list)[sorted_indices]
sorted_members = np.array(members)[sorted_indices]
sorted_colors = colors[sorted_indices]
# plotting
fig, ax = plt.subplots(figsize=(8, 6))
ax.stackplot(
decades,
sorted_issues_list,
labels=sorted_members,
colors=sorted_colors,
edgecolor='black',
linewidth=0.3
)
# setting the title and labels
ax.set_title('Evolution of Total Issues per X-Men Member per Decade (60s-90s)')
ax.set_ylabel('Total Issues')
ax.set_xlabel('Decade')
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
# plotting
fig.tight_layout()
plt.show()
Change stream style
If you want to smooth the stream, we need to use the interp1d
function from scipy
.
This function is a bit particular since it is used to create a function that can be used to interpolate the data. We give it a list of x and y values, and it returns a function that can be used to get the y value for any x value.
Then, we can use this function to create a new list of y values that will be used to create the streamgraph.
from scipy.interpolate import interp1d
# instead of 4 date points, we will use 40
decadesforsmooth = [1960, 1970, 1980, 1990]
new_decades = np.linspace(min(decadesforsmooth), max(
decadesforsmooth), len(decadesforsmooth) * 10)
# interpolating each member's issues list for the new_decades
smoothed_issues_list = []
for issues in sorted_issues_list:
interp_func = interp1d(
decadesforsmooth,
issues,
kind='quadratic'
)
smoothed_issues = interp_func(new_decades)
smoothed_issues_list.append(smoothed_issues)
Then, the rest of the code mainly stays the same. We just add a baseline='wiggle'
argument to the stackplot()
function to make the streamgraph look better.
# calculate the normalized totals
total_issues_per_member = np.sum(issues_list, axis=1)
normalized_totals = total_issues_per_member / np.max(total_issues_per_member)
cmap = plt.cm.Reds
colors = cmap(normalized_totals)
# sort the members by total issues
sorted_indices = np.argsort(total_issues_per_member)
sorted_issues_list = np.array(issues_list)[sorted_indices]
sorted_members = np.array(members)[sorted_indices]
sorted_colors = colors[sorted_indices]
sorted_issues_list = [sublist[:-1] for sublist in sorted_issues_list]
# create the chart
fig, ax = plt.subplots(figsize=(8, 6))
ax.stackplot(
new_decades,
smoothed_issues_list,
labels=sorted_members,
colors=sorted_colors,
edgecolor='black',
linewidth=0.2,
baseline='wiggle'
)
# setting the title and labels
ax.set_title(
'Evolution of Total Issues per X-Men Member per Decade (60s-90s), Sorted by Total Issues')
ax.set_ylabel('Total Issues')
ax.set_xlabel('Decade')
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
# plotting
fig.tight_layout()
plt.show()
Going further
This article explains how to create a streamgraph and how to customize it.
You might want to check this beautiful streamgraph entirely built with matplotlib.