#90 Heatmaps with various input format

This post explains how to make heatmaps with python and seaborn. Three main types of input exist to plot heatmap, let’s study them one by one.

Wide format (untidy)

We call ‘wide format‘ or ‘untidy format‘ a matrix where each row is an individual, and each column represents an observation. In this case, a heatmap consists to make a visual representation of the matrix: each square of the heatmap represents a cell. The color of the cell changes following its value.


# library
import seaborn as sns
import pandas as pd
import numpy as np

# Create a dataset (fake)
df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])

# Default heatmap: just a visualization of this square matrix
p1 = sns.heatmap(df)

Correlation matrix (square)

Suppose you measured several variables for n individuals. A common task is to check if some variables are correlated. You can easily calculate the correlation between each pair of variable, and plot this as a heatmap. This lets you discover which variable are related one each other.

# library
import seaborn as sns
import pandas as pd
import numpy as np

# Create a dataset (fake)
df = pd.DataFrame(np.random.random((100,5)), columns=["a","b","c","d","e"])

# Calculate correlation between each pair of variable
corr_matrix=df.corr()

# plot it
sns.heatmap(corr_matrix, cmap='PuOr')
#sns.plt.show()

Note that in this case, both correlation appear 2 times so you probably want to plot an half heatmap as follow:


# library
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(0)

# Create a dataset (fake)
df = pd.DataFrame(np.random.random((100,5)), columns=["a","b","c","d","e"])

# Can be great to plot only a half matrix
mask = np.zeros_like(corr_matrix)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
p2 = sns.heatmap(corr_matrix, mask=mask, square=True)
#sns.plt.show()

Long format (tidy)

The ‘tidy‘ or ‘long‘ format is when each line represents an observation. You have 3 columns: individual, variable name, and value (x, y and z). You can plot a heatmap from this kind of data as follow:

# library
import seaborn as sns
import pandas as pd
import numpy as np

# Create long format
people=np.repeat(("A","B","C","D","E"),5)
feature=range(1,6)*5
value=np.random.random(25)
df=pd.DataFrame({'feature': feature, 'people': people, 'value': value })

# plot it
df_wide=df.pivot_table( index='people', columns='feature', values='value' )
p2=sns.heatmap( df_wide )
#sns.plt.show()

Leave a Reply

Your email address will not be published.