#401 Customised dendrogram

The chart #400 gives the basic steps to realise a dendrogram from a numeric matrix. Here, let’s describe a few customisation that you can easily apply to your dendrogram.

  •  

     

     

     

     

    You can easily custom the font, rotation angle and content of the labels of your dendrogram and here is the code allowing to do so. It is also possible to change the colour but this is a bit tricky and thus describe in chart #402.

     

     

     

     

     

     

     

    
    # Libraries
    import pandas as pd
    from matplotlib import pyplot as plt
    from scipy.cluster import hierarchy
    import numpy as np
    
    # Data set
    url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
    df = pd.read_csv(url)
    df = df.set_index('model')
    del df.index.name
    df
    
    # Calculate the distance between each sample
    Z = hierarchy.linkage(df, 'ward')
    
    
    # Plot with Custom leaves
    hierarchy.dendrogram(Z, leaf_rotation=90, leaf_font_size=8, labels=df.index)
    
    
    
  •  

     

    By default, some cluster are arbitrary displayed and colored. You can control the number of cluster you want to display by setting a threshold. Here I gave a threshold of 240 and showed it with an horizontal line. That allows me to get 3 clusters.

     

     

     

     

    
    # Libraries
    import pandas as pd
    from matplotlib import pyplot as plt
    from scipy.cluster import hierarchy
    import numpy as np
    
    # Data set
    url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
    df = pd.read_csv(url)
    df = df.set_index('model')
    del df.index.name
    df
    
    # Calculate the distance between each sample
    Z = hierarchy.linkage(df, 'ward')
    
    # Control number of clusters in the plot + add horizontal line.
    hierarchy.dendrogram(Z, color_threshold=240)
    plt.axhline(y=240, c='grey', lw=1, linestyle='dashed')
    
    
    

     

     

  •  

     

     

    Of course, you can control the colour of your tree. This is done in 2 steps. A first colour is given the the part of the tree higher than the cluster (grey here). The second step is to provide a color palette that will be used for the clusters.

     

     

     

     

    
    # Libraries
    import pandas as pd
    from matplotlib import pyplot as plt
    from scipy.cluster import hierarchy
    import numpy as np
    
    # Data set
    url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
    df = pd.read_csv(url)
    df = df.set_index('model')
    del df.index.name
    df
    
    # Calculate the distance between each sample
    Z = hierarchy.linkage(df, 'ward')
    
    # Set the colour of the cluster here:
    hierarchy.set_link_color_palette(['#b30000','#996600', '#b30086'])
    
    # Make the dendrogram and give the colour above threshold
    hierarchy.dendrogram(Z, color_threshold=240, above_threshold_color='grey')
    
    # Add horizontal line.
    plt.axhline(y=240, c='grey', lw=1, linestyle='dashed')
    
    
  • If you have too many nodes and your dendrogram gets to complicated, you can truncate it. Some nodes will be grouped together, making the plot more readable. Two method exist to truncate your dendrogram.

    The ‘lastp’ method allows you to set the number of leaf you want on your tree. Here I set it to 4, and as you can see the tree is divided until having 4 parts. The ‘level‘ method allows you to set the maxim number of separation of a node. Here, a node can never give more than 2 branches, or it will be truncated.

    
    # Libraries
    import pandas as pd
    from matplotlib import pyplot as plt
    from scipy.cluster import hierarchy
    import numpy as np
    
    # Data set
    url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
    df = pd.read_csv(url)
    df = df.set_index('model')
    del df.index.name
    df
    
    # Calculate the distance between each sample
    Z = hierarchy.linkage(df, 'ward')
    
    # method 1: lastp
    hierarchy.dendrogram(Z, truncate_mode = 'lastp', p=4 ) # -> you will have 4 leaf at the bottom of the plot
    
    # method 2: level
    hierarchy.dendrogram(Z, truncate_mode = 'level', p=2) # -> No more than ``p`` levels of the dendrogram tree are displayed.
    
    
  • Your dendrogram do not had to be vertical. You can easily change the orientation. This is especially interesting when you have long labels that hardly fit when displayed vertically.

    
    # Libraries
    import pandas as pd
    from matplotlib import pyplot as plt
    from scipy.cluster import hierarchy
    import numpy as np
    
    # Data set
    url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
    df = pd.read_csv(url)
    df = df.set_index('model')
    del df.index.name
    
    # Calculate the distance between each sample
    Z = hierarchy.linkage(df, 'ward')
    
    # Orientation of the dendrogram
    hierarchy.dendrogram(Z, orientation="right", labels=df.index)
    # or
    hierarchy.dendrogram(Z, orientation="left", labels=df.index)
    
    

Leave a Reply

Your email address will not be published.