#327 Network from correlation matrix

 

 

 

This page explains how to draw a correlation network: a network build on a correlation matrix.

Suppose that you have 10 individuals, and know how close they are related to each other. It is possible to represent these relationships in a network. Each individual will be a node. If 2 individuals are close enough (we set a threshold), then they are linked by a edge. That will show the structure of the population!

In this example, we see that our population is clearly split in 2 groups!

 

 

 

 

 


# libraries
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# I build a data set: 10 individuals and 5 variables for each
ind1=[5,10,3,4,8,10,12,1,9,4]
ind5=[1,1,13,4,18,5,2,11,3,8]
df = pd.DataFrame({ 'A':ind1, 'B':ind1 + np.random.randint(10, size=(10)) , 'C':ind1 + np.random.randint(10, size=(10)) , 'D':ind1 + np.random.randint(5, size=(10)) , 'E':ind1 + np.random.randint(5, size=(10)), 'F':ind5, 'G':ind5 + np.random.randint(5, size=(10)) , 'H':ind5 + np.random.randint(5, size=(10)), 'I':ind5 + np.random.randint(5, size=(10)), 'J':ind5 + np.random.randint(5, size=(10))})
df

# Calculate the correlation between individuals. We have to transpose first, because the corr function calculate the pairwise correlations between columns.
corr = df.corr()
corr

# Transform it in a links data frame (3 columns only):
links = corr.stack().reset_index()
links.columns = ['var1', 'var2','value']
links

# Keep only correlation over a threshold and remove self correlation (cor(A,A)=1)
links_filtered=links.loc[ (links['value'] > 0.8) & (links['var1'] != links['var2']) ]
links_filtered

# Build your graph
G=nx.from_pandas_dataframe(links_filtered, 'var1', 'var2')

# Plot the network:
nx.draw(G, with_labels=True, node_color='orange', node_size=400, edge_color='black', linewidths=1, font_size=15)

Leave a Reply

Your email address will not be published.