Libraries

First, we need to load a few libraries:

import seaborn as sns
import matplotlib.pyplot as plt

Dataset

Since boxplot is used to display the distribution of a numerical variable, we need a dataset that contains at least one numerical variable.

In this example, we will use the iris dataset that we can easily load:

df = sns.load_dataset('iris')

One single numerical variable

The simplest form of boxplot: analysis of the overall distribution of a single numerical variable with the boxplot() function.

sns.set_theme(style="darkgrid")
sns.boxplot(y=df["sepal_length"])
plt.show()

One numerical variable and several groups

Depending on your data, you may want to have a better understanding of the distribution of a given variable between two or more groups.

You can do so by specifying the x parameter in the boxplot() function.

sns.set_theme(style="darkgrid")
sns.boxplot(x=df["species"], y=df["sepal_length"])
plt.show()

Going further

This post explains how to create and customize a boxplot with the seaborn library.

You might be interested in how adding individual data points in boxplot and how to create a raincloud plot with the ptitprince library.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!