#111 Custom correlogram

The graph #110 showed how to make a basic correlogram with seaborn. This post aims to explain how to improve it. It is divided in 2 parts: how to custom the correlation observation (for each pair of numeric variable), and how to custom the distribution (diagonal of the matrix).

  • The first customisation you can apply to the scatterplots showing the correlation is to decide wether or not you want to show a regression. In the ‘kind’ argument, specify ‘reg’ for a regression, scatter if not.

    
    # library & dataset
    import matplotlib.pyplot as plt
    import seaborn as sns
    df = sns.load_dataset('iris')
    
    # with regression
    sns.pairplot(df, kind="reg")
    plt.show()
    
    # without regression
    sns.pairplot(df, kind="scatter")
    plot.show()
    
    

    Then, you can custom all the scatter plots as if they were individual scatter plots. So do not hesitate to visit the dedicated section for more info about it. Note that seaborn allows to easily map a color to dots, what allows to study the behaviour of distinct groups for example.

    
    # library & dataset
    import matplotlib.pyplot as plt
    import seaborn as sns
    df = sns.load_dataset('iris')
    
    # left
    sns.pairplot(df, kind="scatter", hue="species", markers=["o", "s", "D"], palette="Set2")
    plt.show()
    
    # right: you can give other arguments with plot_kws.
    sns.pairplot(df, kind="scatter", hue="species", plot_kws=dict(s=80, edgecolor="white", linewidth=2.5))
    plt.show()
    
    
  • The distribution of each variable can be visualised with histograms or density plots. Visit the related section (histogram, density) to understand how to customise them.

    
    # library & dataset
    import seaborn as sns
    import matplotlib.pyplot as plt
    df = sns.load_dataset('iris')
    
    # Density
    sns.pairplot(df, diag_kind="kde")
    
    # Histogram
    sns.pairplot(df, diag_kind="hist")
    
    # You can custom it as a density plot or histogram so see the related sections
    sns.pairplot(df, diag_kind="kde", diag_kws=dict(shade=True, bw=.05, vertical=False) )
    
    
  • Sponsors

  • 2 comments

    • This is great and I love Seaborn in general. One request: it seems that if the grouping variable contains numbers, that column is also included in the plot matrix, which is not useful in many cases. Could you add some easy way to explicitly control whether or not the grouping variable is also included in the matrix of plots or control which variables are plotted? Yes, I know I can covert the numbers in my label column to strings and this goes away, but this forces me to make a copy the dataframe for this function so as to avoid altering my data going forward after this function (I want my labels to remain numbers), which wastes time and RAM for larger data sets.

      Reply

    Leave a Reply to Shiny Cancel reply

    Your email address will not be published.