The chart #400 gives the basic steps to realise a dendrogram from a numeric matrix. Here, let’s describe a few customisation that you can easily apply to your dendrogram.
-
You can easily custom the font, rotation angle and content of the labels of your dendrogram and here is the code allowing to do so. It is also possible to change the colour but this is a bit tricky and thus describe in chart #402.
# Libraries import pandas as pd from matplotlib import pyplot as plt from scipy.cluster import hierarchy import numpy as np # Data set url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv' df = pd.read_csv(url) df = df.set_index('model') del df.index.name df # Calculate the distance between each sample Z = hierarchy.linkage(df, 'ward') # Plot with Custom leaves hierarchy.dendrogram(Z, leaf_rotation=90, leaf_font_size=8, labels=df.index)
-
By default, some cluster are arbitrary displayed and colored. You can control the number of cluster you want to display by setting a threshold. Here I gave a threshold of 240 and showed it with an horizontal line. That allows me to get 3 clusters.
# Libraries import pandas as pd from matplotlib import pyplot as plt from scipy.cluster import hierarchy import numpy as np # Data set url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv' df = pd.read_csv(url) df = df.set_index('model') del df.index.name df # Calculate the distance between each sample Z = hierarchy.linkage(df, 'ward') # Control number of clusters in the plot + add horizontal line. hierarchy.dendrogram(Z, color_threshold=240) plt.axhline(y=240, c='grey', lw=1, linestyle='dashed')
-
Of course, you can control the colour of your tree. This is done in 2 steps. A first colour is given the the part of the tree higher than the cluster (grey here). The second step is to provide a color palette that will be used for the clusters.
# Libraries import pandas as pd from matplotlib import pyplot as plt from scipy.cluster import hierarchy import numpy as np # Data set url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv' df = pd.read_csv(url) df = df.set_index('model') del df.index.name df # Calculate the distance between each sample Z = hierarchy.linkage(df, 'ward') # Set the colour of the cluster here: hierarchy.set_link_color_palette(['#b30000','#996600', '#b30086']) # Make the dendrogram and give the colour above threshold hierarchy.dendrogram(Z, color_threshold=240, above_threshold_color='grey') # Add horizontal line. plt.axhline(y=240, c='grey', lw=1, linestyle='dashed')
-
- #401 Truncated dendrogram
- #401 Truncated dendrogram
If you have too many nodes and your dendrogram gets to complicated, you can truncate it. Some nodes will be grouped together, making the plot more readable. Two method exist to truncate your dendrogram.
The ‘lastp’ method allows you to set the number of leaf you want on your tree. Here I set it to 4, and as you can see the tree is divided until having 4 parts. The ‘level‘ method allows you to set the maxim number of separation of a node. Here, a node can never give more than 2 branches, or it will be truncated.
# Libraries import pandas as pd from matplotlib import pyplot as plt from scipy.cluster import hierarchy import numpy as np # Data set url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv' df = pd.read_csv(url) df = df.set_index('model') del df.index.name df # Calculate the distance between each sample Z = hierarchy.linkage(df, 'ward') # method 1: lastp hierarchy.dendrogram(Z, truncate_mode = 'lastp', p=4 ) # -> you will have 4 leaf at the bottom of the plot # method 2: level hierarchy.dendrogram(Z, truncate_mode = 'level', p=2) # -> No more than ``p`` levels of the dendrogram tree are displayed.
-
- #401 Dendrogram orientation
- #401 Dendrogram Orientation
Your dendrogram do not had to be vertical. You can easily change the orientation. This is especially interesting when you have long labels that hardly fit when displayed vertically.
# Libraries import pandas as pd from matplotlib import pyplot as plt from scipy.cluster import hierarchy import numpy as np # Data set url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv' df = pd.read_csv(url) df = df.set_index('model') del df.index.name # Calculate the distance between each sample Z = hierarchy.linkage(df, 'ward') # Orientation of the dendrogram hierarchy.dendrogram(Z, orientation="right", labels=df.index) # or hierarchy.dendrogram(Z, orientation="left", labels=df.index)
Thanks very much Yan; this looks awesome.
How does one save the dendrogram to an image file after creating it?
Best wishes,