#400 Basic Dendrogram




This page aims to describe how to realise a basic dendrogram with Python. To realise such a dendrogram, you first need to have a numeric matrix. Each line represent an entity (here a car). Each column is a variable that describes the cars. The objective is to cluster the entities to know who share similarities with who.

At the end, entities that are highly similar are close in the Tree. Let’s start by loading a dataset and the requested libraries:






# Libraries
import pandas as pd
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np

# Import the mtcars dataset from the web + keep only numeric variables
url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
df = pd.read_csv(url)
df = df.set_index('model')
del df.index.name


All right, now that we have our numeric matrix, we can calculate the distance between each car, and realise the hierarchical clustering. This is done through the linkage function. I do not enter in the details now, but I strongly advise to visit the graph #401 for more details concerning this crucial step.


# Calculate the distance between each sample
# You have to think about the metric you use (how to measure similarity) + about the method of clusterization you use (How to group cars)
Z = linkage(df, 'ward')

Last but not least, you can easily plot this object as a dendrogram using the dendrogram function. See graph #401 for possible customisation.



# Make the dendrogram
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('sample index')
plt.ylabel('distance (Ward)')
dendrogram(Z, labels=df.index, leaf_rotation=90)

Leave a Reply

Your email address will not be published.