import pandas as pd
from plotnine import *
Dataset
Since scatter plot is a type of chart that displays values for two numerical variables for a set of data, we will load the iris
dataset.
The iris
dataset is a classic dataset that contains the sepal and petal length and width of 150 iris flowers of three different species: setosa, versicolor, and virginica. In our case, we can to plot the sepal length on the x-axis and the sepal width on the y-axis. You can learn more about scatter plots by reading this section of the Python Graph Gallery.
url = 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/iris.csv'
df = pd.read_csv(url)
Most simple scatter plot
The only difference of plotnine
compared to ggplot2
is that you have to wrap your code around parenthesis. Here is the most simple scatter plot you can do with plotnine
:
(
ggplot(df, aes(x='sepal_length', y='sepal_width')) +
geom_point()
)
Custom colors
If you add the color
argument inside the geom_point()
function, you can change the default color of the points:
(
ggplot(df, aes(x='sepal_length', y='sepal_width')) +
geom_point(color='blue')
)
Color by group
If you want to color the points by a specific group, you just have to add color='species'
inside the aes()
function:
(
ggplot(df, aes(x='sepal_length', y='sepal_width', color='species')) +
geom_point()
)
Going further
This article explains how to create a scatter plot with plotnine.
If you want to go further, you can also learn how to custom markers in a scatter plot or how to custom theme.