Density Chart with Multiple Groups

logo of a chart:Density

This post shows how to compare the distribution of several variables with density charts using the kdeplot() function of seaborn library. Here are a few examples with descriptions.

Density Chart with Multiple Groups

A multi density chart allows to compare the distribution of several groups. Unfortunatelly, this type of charts tend to get cluttered: groups overlap each other and the figure gets unreadable. An easy workaround is to use transparency. However, it won’t solve the issue completely and it is often better to consider other options suggested further in this post.

# libraries
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine.data import diamonds # dataset

# Set figure size for the notebook
plt.rcParams["figure.figsize"]=12,8

# set seaborn whitegrid theme
sns.set_theme(style="whitegrid")

# Without transparency
sns.kdeplot(data=diamonds, x="price", hue="cut", cut=0, fill=True, common_norm=False, alpha=1)
plt.show()

Note you can easily produce pretty much the same figure with some more transparency in order to see all groups

# With transparency
sns.kdeplot(data=diamonds, x="price", hue="cut", fill=True, common_norm=False, alpha=0.4)
plt.show()

Here is an example with another dataset where it works much better. In this dataset, groups have very distinct distribution, and it is easy to spot them even if on the same chart. Note that it is recommended to add group name next to their distribution instead of having a legend beside the chart.

# libraries
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# set seaborn whitegrid theme
sns.set_theme(style="whitegrid")

# load dataset from github and convert it to a long format
data = pd.read_csv("https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv")
data = pd.melt(data, var_name='text', value_name='value')

# take only "Almost No Chance", "About Even", "Probable", "Almost Certainly"
data = data.loc[data.text.isin(["Almost No Chance","About Even","Probable","Almost Certainly"])]

# density plot
p = sns.kdeplot(data=data, x="value", hue="text", fill=True, common_norm=False, alpha=0.6, palette="viridis", legend=False)
# control x limit
plt.xlim(0, 100)

# dataframe for annotations
annot = pd.DataFrame({
'x': [5, 53, 65, 79],
'y': [0.15, 0.4, 0.06, 0.1],
'text': ["Almost No Chance", "About Even", "Probable", "Almost Certainly"]
})

# add annotations one by one with a loop
for point in range(0,len(annot)):
     p.text(annot.x[point], annot.y[point], annot.text[point], horizontalalignment='left', size='large')

# add axis names        
plt.xlabel("Assigned Probability (%)")
plt.ylabel("")
        
# show the graph
plt.show()