The graph #110 showed how to make a basic correlogram with seaborn. This post aims to explain how to improve it. It is divided in 2 parts: how to custom the correlation observation (for each pair of numeric variable), and how to custom the distribution (diagonal of the matrix).

The first customisation you can apply to the scatterplots showing the correlation is to decide wether or not you want to show a regression. In the ‘kind’ argument, specify ‘reg’ for a regression, scatter if not.
# library & dataset import matplotlib.pyplot as plt import seaborn as sns df = sns.load_dataset('iris') # with regression sns.pairplot(df, kind="reg") plt.show() # without regression sns.pairplot(df, kind="scatter") plot.show()
Then, you can custom all the scatter plots as if they were individual scatter plots. So do not hesitate to visit the dedicated section for more info about it. Note that seaborn allows to easily map a color to dots, what allows to study the behaviour of distinct groups for example.
# library & dataset import matplotlib.pyplot as plt import seaborn as sns df = sns.load_dataset('iris') # left sns.pairplot(df, kind="scatter", hue="species", markers=["o", "s", "D"], palette="Set2") plt.show() # right: you can give other arguments with plot_kws. sns.pairplot(df, kind="scatter", hue="species", plot_kws=dict(s=80, edgecolor="white", linewidth=2.5)) plt.show()

The distribution of each variable can be visualised with histograms or density plots. Visit the related section (histogram, density) to understand how to customise them.
# library & dataset import seaborn as sns import matplotlib.pyplot as plt df = sns.load_dataset('iris') # Density sns.pairplot(df, diag_kind="kde") # Histogram sns.pairplot(df, diag_kind="hist") # You can custom it as a density plot or histogram so see the related sections sns.pairplot(df, diag_kind="kde", diag_kws=dict(shade=True, bw=.05, vertical=False) )
Thanks, I was looking for this!
This is great and I love Seaborn in general. One request: it seems that if the grouping variable contains numbers, that column is also included in the plot matrix, which is not useful in many cases. Could you add some easy way to explicitly control whether or not the grouping variable is also included in the matrix of plots or control which variables are plotted? Yes, I know I can covert the numbers in my label column to strings and this goes away, but this forces me to make a copy the dataframe for this function so as to avoid altering my data going forward after this function (I want my labels to remain numbers), which wastes time and RAM for larger data sets.