In order to avoid the creation of a spaghetti plot, it is a good practice to highlight the group(s) that interests you the most in your line plot. It allows the reader to understand your point quickly, instead of struggling to decipher hundreds of lines.
The trick is to plot all the groups with thin and discreet lines first. Then, replete the interesting group(s) with strong amdreally visible lines. Moreover, it is good practice to annotate this group with a custom annotation.
# libraries and data import matplotlib.pyplot as plt import numpy as np import pandas as pd # Make a data frame df=pd.DataFrame({'x': range(1,11), 'y1': np.random.randn(10), 'y2': np.random.randn(10)+range(1,11), 'y3': np.random.randn(10)+range(11,21), 'y4': np.random.randn(10)+range(6,16), 'y5': np.random.randn(10)+range(4,14)+(0,0,0,0,0,0,0,-3,-8,-6), 'y6': np.random.randn(10)+range(2,12), 'y7': np.random.randn(10)+range(5,15), 'y8': np.random.randn(10)+range(4,14) }) #plt.style.use('fivethirtyeight') plt.style.use('seaborn-darkgrid') my_dpi=96 plt.figure(figsize=(480/my_dpi, 480/my_dpi), dpi=my_dpi) # multiple line plot for column in df.drop('x', axis=1): plt.plot(df['x'], df[column], marker='', color='grey', linewidth=1, alpha=0.4) # Now re do the interesting curve, but biger with distinct color plt.plot(df['x'], df['y5'], marker='', color='orange', linewidth=4, alpha=0.7) # Change xlim plt.xlim(0,12) # Let's annotate the plot num=0 for i in df.values[9][1:]: num+=1 name=list(df)[num] if name != 'y5': plt.text(10.2, i, name, horizontalalignment='left', size='small', color='grey') # And add a special annotation for the group we are interested in plt.text(10.2, df.y5.tail(1), 'Mr Orange', horizontalalignment='left', size='small', color='orange') # Add titles plt.title("Evolution of Mr Orange vs other students", loc='left', fontsize=12, fontweight=0, color='orange') plt.xlabel("Time") plt.ylabel("Score")
Pingback: Pythonで日本語の彙試験6回をグラフに ver. 2 | Think it aloud
Very helpful, thanks for sharing!
what’s the logical behind for i in df.values[9][1:]:?
For `i in df.values[9][1:]` , i represents the y-coordinate for the final point for each student over time. 9 is selecting where index=9 (which is the last row in this dataframe). In the subsequent for loop, this is used in `plt.text(10.2, i, name, …)` to plot the name of each student (“y1”, “y2”, etc.) just to the right of the line, starting at x=10.2, since the line ends at x=10.0.
thanks for sharing this