Display the number of observations in a violinplot

logo of a chart:Violin

It is a good practice to specify the number of observations for each group within a violinplot. Indeed, a group with 10 observations can have the exact same shape as a group with 10,000 observations, though they will be considered quite differently in statistical analysis.

Libraries

First, we need to load a few libraries:

  • matplotlib: for displaying the chart
  • seaborn: for creating the chart
  • numpy: for some calculations
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

Dataset

Since violin is used to display the distribution of a numerical variable, we need a dataset that contains at least one numerical variable.

In this example, we will use the iris dataset that we can easily load with seaborn.

df = sns.load_dataset('iris')

Violin plot

In the following example, we start from a simple violinplot and add annotations to it.

To do so we:

  • calculate the median sepal_length for each group and store them in a variable named medians
  • we then create a nobs list which stores the number of observations for each group
  • eventually, we add labels to our figure.

To add labels, keep in mind that seaborn is built on top of matplotlib, thus seaborn objects can be stored in matplotlib axes or figures (here we store the violinplot in a matplotlib axes object named ax). This enables us to use matplotlib axes .get_xticklabels() as well as .text() functions and its various parameters (horizontalalignment, size, color, weight) to add text to our figure.

# calculate medians and number of observations
medians = df.groupby(['species'])['sepal_length'].median().values
nobs = df['species'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
  
sns.set_theme(style="darkgrid")
ax = sns.violinplot(x="species", y="sepal_length", data=df)
 
# Add text to the figure
pos = range(len(nobs))
for tick, label in zip(pos, ax.get_xticklabels()):
   ax.text(pos[tick], medians[tick] + 2, nobs[tick],
            horizontalalignment='center',
            size='small',
            color='black')
plt.show()

Going further

This post explains how to create and customize a violin plot with the seaborn library.

You might be interested in how adding individual data points in violin plot and how to create a raincloud plot with the ptitprince library.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!