This page explains how to draw a correlation network: a network build on a correlation matrix.
Suppose that you have 10 individuals, and know how close they are related to each other. It is possible to represent these relationships in a network. Each individual will be a node. If 2 individuals are close enough (we set a threshold), then they are linked by a edge. That will show the structure of the population!
In this example, we see that our population is clearly split in 2 groups!
# libraries import pandas as pd import numpy as np import networkx as nx import matplotlib.pyplot as plt # I build a data set: 10 individuals and 5 variables for each ind1=[5,10,3,4,8,10,12,1,9,4] ind5=[1,1,13,4,18,5,2,11,3,8] df = pd.DataFrame({ 'A':ind1, 'B':ind1 + np.random.randint(10, size=(10)) , 'C':ind1 + np.random.randint(10, size=(10)) , 'D':ind1 + np.random.randint(5, size=(10)) , 'E':ind1 + np.random.randint(5, size=(10)), 'F':ind5, 'G':ind5 + np.random.randint(5, size=(10)) , 'H':ind5 + np.random.randint(5, size=(10)), 'I':ind5 + np.random.randint(5, size=(10)), 'J':ind5 + np.random.randint(5, size=(10))}) df # Calculate the correlation between individuals. We have to transpose first, because the corr function calculate the pairwise correlations between columns. corr = df.corr() corr # Transform it in a links data frame (3 columns only): links = corr.stack().reset_index() links.columns = ['var1', 'var2','value'] links # Keep only correlation over a threshold and remove self correlation (cor(A,A)=1) links_filtered=links.loc[ (links['value'] > 0.8) & (links['var1'] != links['var2']) ] links_filtered # Build your graph G=nx.from_pandas_dataframe(links_filtered, 'var1', 'var2') # Plot the network: nx.draw(G, with_labels=True, node_color='orange', node_size=400, edge_color='black', linewidths=1, font_size=15)
Do you have an example of this and then further implementing the MST algorithm? Thanks!
The nx function .rom_pandas_dataframe has been removed since networkx 2.0.
Errorcode nx.from_pandas_dataframe has no attribute
Use from_pandas_edgelist instead.
https://stackoverflow.com/questions/49223057/attributeerror-module-networkx-has-no-attribute-from-pandas-dataframe
AttributeError: module ‘networkx’ has no attribute ‘from_pandas_dataframe’
i get the same error and i try with python 2.7 and python3
Hi There,
Very lovely, but i can’t find anywhere here how to plot node self-connections. Very common in recurrent neural networks!
Do you have an example for this command:
# Keep only correlation over a threshold and remove self correlation (cor(A,A)=1)
links_filtered=links.loc[ (links[‘value’] > 0.8) & (links[‘var1’] != links[‘var2’]) ]
links_filtered
if I want all the values- -0.3 < 'value' < 0.3?
Tnx!