#94 Use normalization on seaborn heatmap

Sometimes, a normalization step is necessary to find out patterns in your heatmap. Check the left heatmap: an individual has higher values than others. Thus, he absorbs all the color variation: his column appears yellow and the rest of the heatmap appears green. To avoid this, you have to normalize the data frame. You can normalize on columns or on row. Several formula can be used, read this page to learn the one you need.

Column normalization

(see charts above)


# libraries
import seaborn as sns
import pandas as pd
import numpy as np

# Create a dataframe where the average value of the second column is higher:
df = pd.DataFrame(np.random.randn(10,10) * 4 + 3)
df[1]=df[1]+40

# If we do a heatmap, we just observe that a column as higher values than others:
sns.heatmap(df, cmap='viridis')
#sns.plt.show()

# Now if we normalize it by column:
df_norm_col=(df-df.mean())/df.std()
sns.heatmap(df_norm_col, cmap='viridis')
#sns.plt.show()

Row normalization

The same principle works for row normalization. Note that I am not sure if I use the best way for normalization… Please comment if you have a better way to do this.


# libraries
import seaborn as sns
import pandas as pd
import numpy as np

# Create a dataframe where the average value of the second row is higher
df = pd.DataFrame(np.random.randn(10,10) * 4 + 3)
df.iloc[2]=df.iloc[2]+40

# If we do a heatmap, we just observe that a row has higher values than others:
sns.heatmap(df, cmap='viridis')
#sns.plt.show()

# Normalize it by row:
# (not sure if it is the best way, please feel free to give me a better method.)
# 1: substract mean
df_norm_row=df.sub(df.mean(axis=1), axis=0)
# 2: divide by standard dev
df_norm_row=df_norm_row.div( df.std(axis=1), axis=0 )

# And see the result
sns.heatmap(df_norm_row, cmap='viridis')
#sns.plt.show()

Leave a Reply

Your email address will not be published.