Multiple histograms with plotnine

logo of a chart:Histogram

In a previous post, we saw how to create simple histogram with plotnine.

In this step-by-step post, we will see how to create a multiple histograms using plotnine. We will see how split a histogram according to a variable and how to create a mirror histogram.

Libraries

For creating this chart, we will need to load the following libraries:

import pandas as pd
from plotnine import *

Dataset

Since histograms are a type of chart that displays the distribution of a numerical variable, we need a dataset that contains this type of values.

For this, we will create a simple dataset with 2 variables: value and category. The value variable will contain the numerical values that we want to plot, and the category variable will contain the categories that we want to use to group the values.

# Generate data
import numpy as np
group1_num = np.random.normal(loc=0, scale=1, size=300) 
group2_num = np.random.normal(loc=8, scale=2, size=300)
group1_cat = np.repeat('Group1', 300)
group2_cat = np.repeat('Group2', 300)

df = pd.DataFrame({
    'numerical': np.concatenate([group1_num, group2_num]),
    'categorical': np.concatenate([group1_cat, group2_cat])
})
df.head()
numerical categorical
0 0.316677 Group1
1 -0.577093 Group1
2 0.716630 Group1
3 -1.756055 Group1
4 0.895396 Group1

Double histogram

We use the geom_histogram() function to create a histogram, and in order to display two histograms on the same chart, we need to use the fill argument to differentiate them.

This argument must be the name of the variable that we want to use to group the data. In this case, we will use the category variable to group the data.

(
ggplot(df, aes(x='numerical', fill='categorical')) +
    geom_histogram(bins=20) +
    theme_minimal()
)

Mirror histogram

To create a mirror histogram, we need to have to have 2 variables that we want to display.

For this we add 2 new columns to our dataset: numerical_top and numerical_bottom.

Then we call the geom_histogram() function twice, one for each variable, and we use the y argument to specify the position of the histogram.

df['num_top'] = np.random.normal(loc=5, scale=2, size=600)
df['num_bottom'] = np.random.normal(loc=0, scale=2, size=600)

(
ggplot(df) +
    geom_histogram(aes(x='num_top', y='..density..'), bins=20, fill='lightblue') +
    geom_histogram(aes(x='num_bottom', y='-..density..'), bins=20, fill='darkred') +
    xlab('Value') +
    theme_minimal()
)

Going further

This article explains how to create a histogram with plotnine.

If you want to go further, you can check how to custom colors in histogram and have a look at the histogram section of gallery.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!