This page aims to describe how to realise a basic dendrogram with Python. To realise such a dendrogram, you first need to have a **numeric matrix**. Each line represent an **entity** (here a car). Each column is a **variable** that describes the cars. The objective is to **cluster** the entities to know who share similarities with who.

At the end, entities that are highly similar are close in the Tree. Let’s start by loading a dataset and the requested libraries:

# Libraries import pandas as pd from matplotlib import pyplot as plt from scipy.cluster.hierarchy import dendrogram, linkage import numpy as np # Import the mtcars dataset from the web + keep only numeric variables url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv' df = pd.read_csv(url) df = df.set_index('model') del df.index.name df

All right, now that we have our numeric matrix, we can calculate the **distance** between each car, and realise the** hierarchical clustering**. This is done through the **linkage** function. I do not enter in the details now, but I strongly advise to visit the graph #401 for more details concerning this crucial step.

# Calculate the distance between each sample # You have to think about the metric you use (how to measure similarity) + about the method of clusterization you use (How to group cars) Z = linkage(df, 'ward')

Last but not least, you can easily plot this object as a dendrogram using the dendrogram function. See graph #401 for possible customisation.

# Make the dendrogram plt.title('Hierarchical Clustering Dendrogram') plt.xlabel('sample index') plt.ylabel('distance (Ward)') dendrogram(Z, labels=df.index, leaf_rotation=90)