Libraries
Pandas is a popular open-source Python library used for data manipulation and analysis. It provides data structures and functions that make working with structured data, such as tabular data (like Excel
spreadsheets or SQL
tables), easy and intuitive.
To install Pandas, you can use the following command in your command-line interface (such as Terminal
or Command Prompt
):
pip install pandas
Matplotlib functionalities have been integrated into the pandas library, facilitating their use with dataframes
and series
. For this reason, you might also need to import the matplotlib library when building charts with Pandas.
This also means that they use the same functions, and if you already know Matplotlib, you'll have no trouble learning plots with Pandas.
import pandas as pd
import matplotlib.pyplot as plt
Dataset
In order to create graphics with Pandas, we need to use pandas objects: Dataframes
and Series
. A dataframe can be seen as an Excel
table, and a series as a column
in that table. This means that we must systematically convert our data into a format used by pandas.
We generate 3 variables: 2 quantitative using np.random.uniform()
and np.random.normal()
functions and one qualitative, whose values depend on the values of the first qualitative variable.
data = {
"Product": ["Product A", "Product A", "Product A", "Product B", "Product B", "Product B"],
"Segment": ["Segment 1", "Segment 2", "Segment 3", "Segment 1", "Segment 2", "Segment 3"],
"Amount_sold": [100, 120, 120, 80, 160, 150]
}
df = pd.DataFrame(data)
Basic grouped barplot
Once we've opened our dataset, we'll now create the graph.
This dataset represents sales data for different products (Product A
and Product B
) across various segments (Segment 1
, Segment 2
, and Segment
3). The "Amount_sold"
column represents the quantity of each product sold within each segment.
The pivot()
function is used in this context to reshape the original DataFrame into a format suitable for creating a grouped barplot. In a grouped barplot, you typically want each category (in this case, each product
) to have its own set of bars grouped by another categorical variable (in this case, the segments
).
# Pivot the data to have 'Product' as columns and 'Segment' as the index
pivot_df = df.pivot(index='Segment',
columns='Product',
values='Amount_sold')
# Create a grouped barplot
pivot_df.plot.bar(grid=True)
plt.show()
Custom grouped barplot
In this customized version, we will change :
- colors
- the axis
- add label and title
# Pivot the data to have 'Product' as columns and 'Segment' as the index
pivot_df = df.pivot(index='Segment',
columns='Product',
values='Amount_sold')
# Create a grouped barplot
colors = ['purple', 'orange']
ax = pivot_df.plot.barh(grid=True,
color=colors,
figsize=(6,6))
# Add legend
plt.legend(loc='lower right')
#Add title and label
ax.set_xlabel('Segment')
ax.set_ylabel('Amount Sold')
ax.set_title('Sales by Segment and Product')
plt.show()
Going further
This post explains how to create a barplot with grouping built with pandas.
For more examples of how to create or customize your plots with Pandas, see the pandas section. You may also be interested in how to customize your barplot.