Libraries

For creating this chart, we will need to load the following libraries:

import pandas as pd
from plotnine import *

Dataset

Since bar plots are a type of chart that displays the counts or values of different categories in a dataset, we will need a dataset that contains the categories we want to compare.

For instance, let's consider a dataset that contains the sales data for three different products: Product A, Product B, and Product C. In our case, we can plot the names of the products on the x-axis and their corresponding sales figures on the y-axis. You can learn more about bar plots by reading this section of the Python Graph Gallery.

sales_data = {
    'Product': ['Product A', 'Product B', 'Product C'],
    'Sales': [150, 220, 180]
}

# Convert the dictionary to a pandas DataFrame
df = pd.DataFrame(sales_data)

Most simple barplot

The ggplot() function works the following way: you start by initializing a plot with ggplot() and then you add layers to it using the + operator.

In this case, we will use the geom_bar() function to create the barplot. And since our dataframe is already aggregated, we need to specify the stat=identity argument to tell ggplot() that the data is already summarized.

(
ggplot(df, aes(x='Product', y='Sales')) +
    geom_bar(stat='identity')
)

Change color

There are 2 colors we can change: the border and the fill color.

You can use the color and fill arguments to change them inside the geom_bar() function.

(
ggplot(df, aes(x='Product', y='Sales')) +
    geom_bar(stat='identity', fill='lightblue', color='red')
)

Change width

You can use the width argument to change the width of the bars inside the geom_bar() function.

By default, the width is set to 0.9. You can increase or decrease this value to make the bars wider or narrower.

(
ggplot(df, aes(x='Product', y='Sales')) +
    geom_bar(stat='identity', width=0.5)
)

Change bar order

In order to change to change the order of the bars, we need to update our dataset before plotting it.

With pandas, we can use the sort_values() function and specify the column we want to sort by. In this case, it will be the Sales column.

# Order dataframe according to `Sales`
df = df.sort_values(
    by='Sales',
    ascending=False # sort in descending order
)

# Then, set the order of the categories in the 'Product' column
df['Product'] = pd.Categorical(df['Product'], categories=df['Product'])

# Now, you can plot the ordered bar chart
(
ggplot(df, aes(x='Product', y='Sales')) +
    geom_bar(stat='identity')
)

Going further

This article explains how to create a bar plot with plotnine.

If you want to go further, you can also learn how to custom a barplot with plotnine and have a look at the bar plot section of gallery.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!