#86 Avoid overlapping in scatterplot with 2D density


Consider the scatterplot on the left hand side of this figure. A lot of dots overlap and make the figure hard to read. Even worse, it is impossible to determine how many data points are in each position. In this case, a solution is to cut the plotting window in several bins, and represent the number of data points in each bin by a color. Following the shape of the bin, this makes Hexbin plot or 2D histogram.

Then, it is possible to make a smoother result using Gaussian KDE (kernel density estimate). Its representation is called a 2D density plot, and you can add a contour to denote each step. See more concerning these types of graphic in the 2D density section of the python graph gallery. This plot has been inspired by this stack overflow question.

# Libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kde

# Create data: 200 points
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 3]], 200)
x, y = data.T

# Create a figure with 6 plot areas
fig, axes = plt.subplots(ncols=6, nrows=1, figsize=(21, 5))

# Everything sarts with a Scatterplot
axes[0].plot(x, y, 'ko')
# As you can see there is a lot of overplottin here!

# Thus we can cut the plotting window in several hexbins
nbins = 20
axes[1].hexbin(x, y, gridsize=nbins, cmap=plt.cm.BuGn_r)

# 2D Histogram
axes[2].set_title('2D Histogram')
axes[2].hist2d(x, y, bins=nbins, cmap=plt.cm.BuGn_r)

# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents
k = kde.gaussian_kde(data.T)
xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))

# plot a density
axes[3].set_title('Calculate Gaussian KDE')
axes[3].pcolormesh(xi, yi, zi.reshape(xi.shape), cmap=plt.cm.BuGn_r)

# add shading
axes[4].set_title('2D Density with shading')
axes[4].pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.BuGn_r)

# contour
axes[5].pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.BuGn_r)
axes[5].contour(xi, yi, zi.reshape(xi.shape) )

  • Sponsors


    • Thank you for maintaining this wonderful site! Another solution that I’ve seen to this problem is to intentionally dither the original scatter plot perhaps in combination with the alpha parameter.

    • This is a really wonderful walk though! Thank you so much for the clear descriptions and step by step guide. This website is pure gold for data scientists.

      One question – what “step” are the contours following exactly in this output?

      Kind regards!


    Leave a Reply

    Your email address will not be published.