This post explains how to make heatmaps with python and seaborn. Three main types of input exist to plot heatmap, let’s study them one by one.
- Wide format (untidy)
-
We call ‘wide format‘ or ‘untidy format‘ a matrix where each row is an individual, and each column represents an observation. In this case, a heatmap consists to make a visual representation of the matrix: each square of the heatmap represents a cell. The color of the cell changes following its value.
# library import seaborn as sns import pandas as pd import numpy as np # Create a dataset (fake) df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"]) # Default heatmap: just a visualization of this square matrix p1 = sns.heatmap(df)
- Correlation matrix (square)
-
- #90 Heatmap from correlation matrix
- #90 Half heatmap
Suppose you measured several variables for n individuals. A common task is to check if some variables are correlated. You can easily calculate the correlation between each pair of variable, and plot this as a heatmap. This lets you discover which variable are related one each other.
# library import seaborn as sns import pandas as pd import numpy as np # Create a dataset (fake) df = pd.DataFrame(np.random.random((100,5)), columns=["a","b","c","d","e"]) # Calculate correlation between each pair of variable corr_matrix=df.corr() # plot it sns.heatmap(corr_matrix, cmap='PuOr') #sns.plt.show()
Note that in this case, both correlation appear 2 times so you probably want to plot an half heatmap as follow:
# library import seaborn as sns import pandas as pd import numpy as np np.random.seed(0) # Create a dataset (fake) df = pd.DataFrame(np.random.random((100,5)), columns=["a","b","c","d","e"]) # Can be great to plot only a half matrix mask = np.zeros_like(corr_matrix) mask[np.triu_indices_from(mask)] = True with sns.axes_style("white"): p2 = sns.heatmap(corr_matrix, mask=mask, square=True) #sns.plt.show()
- Long format (tidy)
-
The ‘tidy‘ or ‘long‘ format is when each line represents an observation. You have 3 columns: individual, variable name, and value (x, y and z). You can plot a heatmap from this kind of data as follow:
# library import seaborn as sns import pandas as pd import numpy as np # Create long format people=np.repeat(("A","B","C","D","E"),5) feature=range(1,6)*5 value=np.random.random(25) df=pd.DataFrame({'feature': feature, 'people': people, 'value': value }) # plot it df_wide=df.pivot_table( index='people', columns='feature', values='value' ) p2=sns.heatmap( df_wide ) #sns.plt.show()
Pingback: How to specify my own criteria in a HeatMap in Python – Ask python questions