2D histograms are useful when you need to analyse the relationship between 2 numerical variables that have a huge number of values. It is useful for avoiding the over-plotted scatterplots. The following example illustrates the importance of the bins argument. You can explicitly tell how many bins you want for the X and the Y axis. The parameters of hist2d() function used in the example are:

  • x, y: input values
  • bins: the number of bins in each dimension
  • cmap : colormap
# libraries
import matplotlib.pyplot as plt
import numpy as np
# create data
x = np.random.normal(size=50000)
y = x * 3 + np.random.normal(size=50000)
# Big bins
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.jet)
# Small bins
plt.hist2d(x, y, bins=(300, 300), cmap=plt.cm.jet)
# If you do not set the same values for X and Y, the bins won't be a square!
plt.hist2d(x, y, bins=(300, 30), cmap=plt.cm.jet)

Once you decide the bin size, it is possible to change the colour palette. Please visit the matplotlib reference page to see the available palette.

# Reds
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.Reds)
# BuPu
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.BuPu)

Finally, it might be useful to add a color bar on the side as a legend. You can add a color bar using colorbar() function.

plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.Greys)

Contact & Edit

👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!