Scatterplot

logo of a chart:ScatterPlot

A scatter plot displays the relationship between 2 numeric variables, one being displayed on the X axis (horizontal) and the other on the Y axis (vertical). Each data point is represented as a circle.

Several tools allow to build scatterplots in python. Seaborn is probably the most straightforward library for the job, but matplotlib allows a greater level of customization. If you are looking for an interactive chart, plotly is definitely the way to go.

This page provides many examples of scatterplots made with those Python tools, going from simple examples to highly customized versions.

⚠️ Scatterplot and overplotting

The main danger with scatterplots is overplotting. When the sample size gets big, circles tend to overlap, making the figure unreadable.

Several workarounds exist to fix the issue, like using opacity or switching to another chart type:

Seaborn logoScatterplots with Seaborn

Seaborn is a python library allowing to make better charts easily. The regplot() function should get you started in minutes. The first example below explains how to build the most basic scatterplot with python. Then, several types of customization are described: adding a regression line, tweaking markers and axis, adding labels and more.


A nice way to add info and highlight trend in a scatter plot is to add a regression line on top of the dots. Thanks to its regplot() and lmplot() function, it's quite easy!

The main difference between those 2 functions are that:

  • regplot() is used for simple scatter plot with a trend line
  • lmplot() can also be used for facetting: e.g creating multiple scatter plots

🔎 scatterplot() function parameters→ see full doc

→ Description

The scatterplot() function of seaborn creates a scatter plot to visualize the relationship between two continuous variables. It displays each observation as a point on a two-dimensional plane.

→ Arguments

Description

Dataframe-like (pandas, numpy, polars...) with the columns we want to plot.

Possible values → dataframe

It just has to be a pandas.DataFrame (columns are variables),numpy.ndarray (rows/columns are variables), or any mapping/sequence (dictionaries/lists)

Supports both long-form (each variable in its own column) and wide-form (variables in separate columns; reshaped internally).

Code Example

# Library & Dataset
import seaborn as sns
df = sns.load_dataset('iris')

# Plot
sns.scatterplot(
  data=df,
  x='sepal_length',
  y='sepal_width'
)
plt.show()

If you are interested in scatterplots, some other chart could be useful to you.

A scatterplot with marginal distribution allows to check the distribution of both the x and y variables. A correlogram allows to check the relationship between each pair of numeric variables in a dataset.

⏱ Quick start (Matplotlib)

# libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Create a dataset:
df=pd.DataFrame({'x_values': range(1,101), 'y_values': np.random.randn(100)*15+range(1,101) })

# plot
plt.plot( 'x_values', 'y_values', data=df, linestyle='none', marker='o')
plt.show()

Matplotlib logoScatterplots with Matplotlib

Matplotlib is another great alternative to build scatterplots with python. As often, it takes a bit more lines of code to get a decent chart, but allows more customization.

The examples below should get you covered for all the most common problems: adding markers, addinglabels, changing shapes and more.

Plotly logoScatterplots with Plotly

If you are looking for an interactive scatterplot, plotly is definitely the way to go. Try hovering over the graph below!

Interactivity is a real plus for scatterplots. It is very useful to have a tooltip associated to every markers to get some additional information about it. Zooming on a specific area of the scatterplot is also very valuable sometimes.

The examples below should help you get started quickly with the plotly API:

Pandas logoScatterplots with Pandas

Pandas, a data analysis library, also offers functions to build scatterplots. It uses matplotlib under the hood, but the syntax is more concise.

The main difference is that we have to work with Pandas objects such as Series and DataFrame.

The examples below should help you get started quickly with the pandas API:

Plotnine logoScatterplots with Plotnine

Plotnine is a python library allowing to make charts using the grammar of graphics principles. The geom_point() function should get you started in minutes.

The examples below should help you get started quickly with the plotnine API:

Matplotlib logoBest python scatterplot examples

The web is full of astonishing charts made by awesome bloggers, (often using R). The Python graph gallery tries to display (or translate from R) some of the best creations and explain how their source code works.

The first example below demos how to add clean labels on a scatterplot, automatically avoiding overlapping. It also explains how to control background, fonts, titles and more.

If you want to display your work here, please drop me a word or even better, submit a Pull Request!

Contact


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥