- #94 Column normalization on heatmap
- #94 Column normalization on heatmap
Sometimes, a normalization step is necessary to find out patterns in your heatmap. Check the left heatmap: an individual has higher values than others. Thus, he absorbs all the color variation: his column appears yellow and the rest of the heatmap appears green. To avoid this, you have to normalize the data frame. You can normalize on columns or on row. Several formula can be used, read this page to learn the one you need.
- Column normalization
-
(see charts above)
# libraries import seaborn as sns import pandas as pd import numpy as np # Create a dataframe where the average value of the second column is higher: df = pd.DataFrame(np.random.randn(10,10) * 4 + 3) df[1]=df[1]+40 # If we do a heatmap, we just observe that a column as higher values than others: sns.heatmap(df, cmap='viridis') #sns.plt.show() # Now if we normalize it by column: df_norm_col=(df-df.mean())/df.std() sns.heatmap(df_norm_col, cmap='viridis') #sns.plt.show()
- Row normalization
-
The same principle works for row normalization. Note that I am not sure if I use the best way for normalization… Please comment if you have a better way to do this.
- #94 Row normalization on heatmap
- #94 Row normalization on heatmap
# libraries import seaborn as sns import pandas as pd import numpy as np # Create a dataframe where the average value of the second row is higher df = pd.DataFrame(np.random.randn(10,10) * 4 + 3) df.iloc[2]=df.iloc[2]+40 # If we do a heatmap, we just observe that a row has higher values than others: sns.heatmap(df, cmap='viridis') #sns.plt.show() # Normalize it by row: # (not sure if it is the best way, please feel free to give me a better method.) # 1: substract mean df_norm_row=df.sub(df.mean(axis=1), axis=0) # 2: divide by standard dev df_norm_row=df_norm_row.div( df.std(axis=1), axis=0 ) # And see the result sns.heatmap(df_norm_row, cmap='viridis') #sns.plt.show()