Let’s consider that you want to study the relationship between **2 numerical variables** with a lot of points. Then you can consider the number of points on each part of the plotting area and thus calculate a **2D kernel density estimate**. It is like a smoothed histogram. Instead of a point falling into a particular bin, it adds a weight to surrounding bins. This plot is inspired from this stack overflow question. See this page to custom the color palette.

# libraries import matplotlib.pyplot as plt import numpy as np from scipy.stats import kde # create data x = np.random.normal(size=500) y = x * 3 + np.random.normal(size=500) # Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents nbins=300 k = kde.gaussian_kde([x,y]) xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j] zi = k(np.vstack([xi.flatten(), yi.flatten()])) # Make the plot plt.pcolormesh(xi, yi, zi.reshape(xi.shape)) plt.show() # Change color palette plt.pcolormesh(xi, yi, zi.reshape(xi.shape), cmap=plt.cm.Greens_r) plt.show()

You can add a **color bar** easily:

# Add color bar plt.pcolormesh(xi, yi, zi.reshape(xi.shape), cmap=plt.cm.Greens_r) plt.colorbar() plt.show()

What does the ‘colorbar()’ values indicate here??

why do you set nbins to 300, is there some way to compute an appropriate nbins???

Thank you! Great post, great code.