Once you understood how to make a heatmap with seaborn and how to make basic customization, you probably want to control the color palette. This is a crucial step since the message provided by your heatmap can be different following the choice you make. Note that datacamp offers this online course to understand the basics of seaborn. Three options are possible:
Sequential palettes translate the value of a variable to the intensity of one color: from bright to dark. You can use this kind of palette when you have, for example, a value going from 0 to 1. Let’s create this kind of data:
# library import seaborn as sns import pandas as pd import numpy as np # Create a dataset (fake) df = pd.DataFrame(np.random.random((10,10)), columns=["a","b","c","d","e","f","g","h","i","j"])
Several sequential palettes are available. Here are 4 examples applied to the df data frame. Find other possibilities here.
sns.heatmap(df, cmap="YlGnBu") sns.heatmap(df, cmap="Blues") sns.heatmap(df, cmap="BuPu") sns.heatmap(df, cmap="Greens") #sns.plt.show()
Note that you can control the value to use for the brightest and darkest color. This is possible using the vmin and vmax argument. Check the 2 examples below. On the left, vmax is set to 0.5. Thus, every cell with a value over 0.5 is represented by a purple dark. On the right, every cell < 0.5 is white, and every cell > 0.7 is purple dark.
sns.heatmap(df, vmin=0, vmax=0.5) sns.heatmap(df, vmin=0.5, vmax=0.7)
Diverging palettes use 2 contrasting colors. Find palette examples here. For example, let’s create a dataset where values goes from -1 to 1. We probably need to use a color from -1 to 0 and another one from 0 to 1.
# libraries import seaborn as sns import pandas as pd import numpy as np # create dataset df = np.random.randn(30, 30) # create heatmap sns.heatmap(df, cmap="PiYG")
Here the color change is made on 0. But you can control this value: let’s try with 1 and the default color palette: everything over 1 will be red, everything under 1 will be blue –> almost everything gets blue since data are centered on 0.
The last possibility is to transform your continuous data as categorical data. When making such bins, several possibilities exist: you can put the same amount of observation in each bin, or cut the data in regular steps. Here is an example using the qcut function of panda.
# library import seaborn as sns import pandas as pd import numpy as np # create data df = pd.DataFrame(np.random.randn(6, 6)) # make it discrete df_q = pd.DataFrame() for col in df: df_q[col] = pd.to_numeric( pd.qcut(df[col], 3, labels=list(range(3))) ) # plot it sns.heatmap(df_q) #sns.plt.show()