Libraries
First, we need to load a few libraries:
- seaborn: for creating the scatterplot
- matplotlib: for displaying the plot
- pandas: for data manipulation
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams["figure.dpi"] = 300
Dataset
Since scatter plot are made for visualizing relationships between two numerical variables, we need a dataset that contains at least two numerical columns.
Here, we will use the iris
dataset that we load directly from the gallery:
path = 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/iris.csv'
df = pd.read_csv(path)
df.head()
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
Control Marker Shape
In order to change the shape of the marker, you need to provide:
marker
: the shape of the marker
There are actually lots of different shapes available. Here are a few examples:
'o'
: circle's'
: square'+'
: plus'x'
: cross'D'
: diamond'h'
: hexagon'p'
: pentagon'v'
: triangle_down'^'
: triangle_up'<'
: triangle_left'>'
: triangle_right- all numbers from
1
to4
- and many more...
You can find them by running the following code:
from matplotlib import markers
print(markers.MarkerStyle.markers.keys())
fig, ax = plt.subplots(figsize=(8, 6))
sns.regplot(
x=df["sepal_length"],
y=df["sepal_width"],
marker="+",
fit_reg=False,
ax=ax
)
plt.show()
Changing Color, Transparency and Size of Markers
You can also change the other features of markers in a plot thanks to scatter_kws
.
This argument accepts a dictionary of argument names and their values. These values will be exclusively used to style the dots. For example, scatter_kws={"color" : "red", "alpha" : 1.5}
will render the line red and with an opacity of 0.3.
color
: color of the markersalpha
: opacity of the markerss
: size of the markers
fig, ax = plt.subplots(figsize=(8, 6))
sns.scatterplot(
x=df["sepal_length"],
y=df["sepal_width"],
color="skyblue",
alpha=1,
s=100,
edgecolor="black",
lw=3,
ax=ax
)
plt.show()
Going further
This post explains how to customize the appearance of the markers in a scatter plot with seaborn.
You might be interested in
- how to visualize linear regression
- how to create a bubble plot, a kind of scatter plot where the size of the marker is proportional to a third variable
- how to colors dots according to a variable.