import pandas as pd
from plotnine import *
Dataset
Since histograms are a type of chart that displays the distribution of a numerical variable, we need a dataset that contains this type of values.
For this, we will create a simple dataset with 2 variables: value
and category
. The value
variable will contain the numerical values that we want to plot, and the category
variable will contain the categories that we want to use to group the values.
# Generate data
import numpy as np
group1_num = np.random.normal(loc=0, scale=1, size=300)
group2_num = np.random.normal(loc=8, scale=2, size=300)
group1_cat = np.repeat('Group1', 300)
group2_cat = np.repeat('Group2', 300)
df = pd.DataFrame({
'numerical': np.concatenate([group1_num, group2_num]),
'categorical': np.concatenate([group1_cat, group2_cat])
})
df.head()
numerical | categorical | |
---|---|---|
0 | 0.316677 | Group1 |
1 | -0.577093 | Group1 |
2 | 0.716630 | Group1 |
3 | -1.756055 | Group1 |
4 | 0.895396 | Group1 |
Double histogram
We use the geom_histogram()
function to create a histogram, and in order to display two histograms on the same chart, we need to use the fill
argument to differentiate them.
This argument must be the name of the variable that we want to use to group the data. In this case, we will use the category
variable to group the data.
(
ggplot(df, aes(x='numerical', fill='categorical')) +
geom_histogram(bins=20) +
theme_minimal()
)
Mirror histogram
To create a mirror histogram, we need to have to have 2 variables that we want to display.
For this we add 2 new columns to our dataset: numerical_top
and numerical_bottom
.
Then we call the geom_histogram()
function twice, one for each variable, and we use the y
argument to specify the position of the histogram.
df['num_top'] = np.random.normal(loc=5, scale=2, size=600)
df['num_bottom'] = np.random.normal(loc=0, scale=2, size=600)
(
ggplot(df) +
geom_histogram(aes(x='num_top', y='..density..'), bins=20, fill='lightblue') +
geom_histogram(aes(x='num_bottom', y='-..density..'), bins=20, fill='darkred') +
xlab('Value') +
theme_minimal()
)
Going further
This article explains how to create a histogram with plotnine.
If you want to go further, you can check how to custom colors in histogram and have a look at the histogram section of gallery.