Boxplot with both groups and subgroups

logo of a chart:Box1

Drawing a grouped boxplot with Seaborn is a common way to show the distribution of mutliple groups for a variable.
In this post, we'll detail how to create these boxplots and how to customize them.

Libraries & Dataset

First, you need to install the following libraries:

  • seaborn is used for creating the plot and load the dataset
  • matplotlib is used for customization purposes

We'll use a dataset on customers in a bar that you can easily load with the code below.

If you've never worked with seaborn, remember to run pip install seaborn in your terminal/command prompt before.

import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('tips')

Simple boxplot with groups and subgroups

With seaborn, this chart is very easy to make. We start by adding a dark grid in the background thanks to the set() function and then we use the boxplot() function with the following arguments:

  • x: the variable in the x-axis (qualitative, the day of the week)
  • y: the variable in the y-axis (quantitative, the total bill)
  • hue: the variable by which we want to separate our box plot (smoker or not), for each value in the other qualitative variable (day of the week)
  • data: the dataset where our variables are stored (df)

Optionnal:

  • palette: the set of color we want to use
  • width: width of each boxplot
# Add a dark grid
sns.set_theme(style="darkgrid")

# Create and display the plot
sns.boxplot(x="day",
            y="total_bill",
            hue="smoker",
            data=df,
            palette="Set1",
            width=0.8)
plt.show()

Customize boxplots with groups and subgroups

To make our previous graphics more aesthetic and customized, we'll add the following components:

  • use a custom palette: we put in red customer who smokes and green if not
  • add axis label and a title
  • change the width of the lines around boxplots with linewidth argument
  • add the mean of each distribution with showmeans=True argument
  • change the size of the outliers with the fliersize argument
# Customization
sns.set_theme(style="darkgrid")
plt.figure(figsize=(8, 6))

# Define a custom color palette
custom_palette = {"Yes": "red", "No": "green"}

# Create and display the plot
sns.boxplot(x="day",
            y="total_bill",
            hue="smoker",
            data=df,
            palette=custom_palette,  # Use the custom palette
            width=0.6,
            linewidth=0.6,
            showmeans=True,
            fliersize=1,
            )  

# Add a title
plt.title("Box Plot of Total Bill by Day and Smoker Status")

# Add labels to the axes
plt.xlabel("Day of the week")
plt.ylabel("Total Bill")

# Show the plot
plt.show()

Going further

This post explains how to create a grouped boxplot in seaborn.

For more examples of how to create or customize your boxplots, see the boxplot section. You may also be interested in how to add individual observation in a boxplot.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!