# Scatterplot

A scatter plot displays the relationship between 2 numeric variables, one being displayed on the X axis (horizontal) and the other on the Y axis (vertical). Each data point is represented as a **circle**.

Several tools allow to build scatterplots in python. Seaborn is probably the most straightforward library for the job, but matplotlib allows a greater level of customization. If you are looking for an interactive chart, plotly is definitely the way to go.

This page provides many examples of scatterplots made with those Python tools, going from **simple examples** to **highly customized** versions.

## ⏱ Quick start (Seaborn)

The `scatterplot()`

function of the `Seaborn`

library is definitely the best way to build a scatterplot in seconds. 🔥

Simply pass a numeric column of a data frame to both the `x`

and `y`

variable and the function will handle the rest.

```
# library & dataset
import seaborn as sns
df = sns.load_dataset('iris')
# use the function scatterplot() to make a scatterplot
sns.scatterplot(x=df["sepal_length"], y=df["sepal_width"])
```

## ⚠️ Scatterplot and overplotting

The main danger with scatterplots is overplotting. When the sample size gets big, circles tend to overlap, making the figure unreadable.

Several workarounds exist to fix the issue, like using opacity or switching to another chart type:

## Scatterplots with `Seaborn`

Seaborn is a python library allowing to make better charts easily. The `regplot()`

function should get you started in minutes. The first example below explains how to build the most basic scatterplot with python. Then, several types of customization are described: adding a regression line, tweaking markers and axis, adding labels and more.

A nice way to add info and highlight trend in a scatter plot is to add a **regression line** on top of the dots. Thanks to its `regplot()`

and `lmplot()`

function, it's quite easy!

The main difference between those 2 functions are that:

`regplot()`

is used for simple scatter plot with a trend line`lmplot()`

can also be used for**facetting**: e.g creating multiple scatter plots

## 🔎 `scatterplot()`

function parameters→ see full doc

### → Description

The `scatterplot()`

function of seaborn creates a scatter plot to visualize the relationship between two continuous variables. It displays each observation as a point on a two-dimensional plane.

### → Arguments

Description

Dataframe-like (pandas, numpy, polars...) with the columns we want to plot.

Possible values → dataframe

It just has to be a `pandas.DataFrame`

(columns are variables),`numpy.ndarray`

(rows/columns are variables), or any mapping/sequence (dictionaries/lists)

Supports both long-form (each variable in its own column) and wide-form (variables in separate columns; reshaped internally).

Code Example

```
# Library & Dataset
import seaborn as sns
df = sns.load_dataset('iris')
# Plot
sns.scatterplot(
data=df,
x='sepal_length',
y='sepal_width'
)
plt.show()
```

## 💡 Other charts involving scatterplots and `Seaborn`

If you are interested in scatterplots, some other chart could be useful to you.

A scatterplot with marginal distribution allows to check the distribution of both the `x`

and `y`

variables. A correlogram allows to check the relationship between each pair of numeric variables in a dataset.

## ⏱ Quick start (Matplotlib)

Matplotlib also requires only a few lines of code to draw a scatterplot thanks to its `plot()`

function. The resulting chart is not as good-looking, but the function probably offers more flexibility in term of customization.

```
# libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Create a dataset:
df=pd.DataFrame({'x_values': range(1,101), 'y_values': np.random.randn(100)*15+range(1,101) })
# plot
plt.plot( 'x_values', 'y_values', data=df, linestyle='none', marker='o')
plt.show()
```

## Scatterplots with `Matplotlib`

`Matplotlib`

is another great alternative to build scatterplots with python. As often, it takes a bit more lines of code to get a decent chart, but allows more customization.

The examples below should get you covered for all the most common problems: adding markers, addinglabels, changing shapes and more.

## Scatterplots with `Plotly`

If you are looking for an **interactive** scatterplot, plotly is definitely the way to go. Try hovering over the graph below!

Interactivity is a real plus for scatterplots. It is very useful to have a **tooltip** associated to every markers to get some additional information about it. **Zooming** on a specific area of the scatterplot is also very valuable sometimes.

The examples below should help you get started quickly with the plotly API:

## Scatterplots with `Pandas`

Pandas, a data analysis library, also offers functions to build scatterplots. It uses matplotlib under the hood, but the **syntax is more concise**.

The main difference is that we have to work with **Pandas objects** such as `Series`

and `DataFrame`

.

The examples below should help you get started quickly with the pandas API:

## Scatterplots with `Plotnine`

Plotnine is a python library allowing to make charts using the grammar of graphics principles. The `geom_point()`

function should get you started in minutes.

The examples below should help you get started quickly with the plotnine API:

## Best python scatterplot examples

The web is full of astonishing charts made by awesome bloggers, (often using R). The Python graph gallery tries to display (or translate from R) some of the best creations and explain how their source code works.

The first example below demos how to add clean labels on a scatterplot, automatically avoiding overlapping. It also explains how to control background, fonts, titles and more.

If you want to display your work here, please drop me a word or even better, submit a Pull Request!