Scatterplot with regression line in Matplotlib

logo of a chart:ScatterPlot

This guide shows how to plot a scatterplot with an overlayed regression line in Matplotlib. The linear regression fit is obtained with numpy.polyfit(x, y) where x and y are two one dimensional numpy arrays that contain the data shown in the scatterplot. The slope and intercept returned by this function are used to plot the regression line.

Let's get started by importing Matplotlib and Numpy

import matplotlib.pyplot as plt
import numpy as np

Now let's create a reproducible random number generator. This ensures the result is the same no matter how many times we generate the random data.

rng = np.random.default_rng(1234)

And finally, let's generate some random data, make the scatterplot, and add the regression line:

# Generate data
x = rng.uniform(0, 10, size=100)
y = x + rng.normal(size=100)

# Initialize layout
fig, ax = plt.subplots(figsize=(9, 9))

# Add scatterplot
ax.scatter(x, y, s=60, alpha=0.7, edgecolors="k")

# Fit linear regression via least squares with numpy.polyfit
# It returns an slope (b) and intercept (a)
# deg=1 means linear fit (i.e. polynomial of degree 1)
b, a = np.polyfit(x, y, deg=1)

# Create sequence of 100 numbers from 0 to 100
xseq = np.linspace(0, 10, num=100)

# Plot regression line
ax.plot(xseq, a + b * xseq, color="k", lw=2.5)

Going further

This post explains how to add a simple linear regression fit in a scatter plot.

You might be interested by how to add estimated coefficients on the plot and how to display regression fit with seaborn.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!