Author Archives

Yan Holtz

Hi! I am Yan Holtz, passionate about data analysis and data visualization. I create the R and the Python graph galleries and hope it helps you!
#39 Hidden data under boxplot

#39 Hidden data under boxplot

    This page is dedicated to the dangerous feature of boxplots. A boxplot summarizes the distribution of a numeric variable for several groups. The problem is than summarizing also means loosing information, and that can become a mistake. If…
Read more
#124 Spaghetti plot

#124 Spaghetti plot

          A Spaghetti plot is a line plot with many lines displayed together. The problem is that it is really hard to read, and thus provide few insight about the data. This is well documented here.…
Read more
#199 Matplotlib style sheets

#199 Matplotlib style sheets

The matplotlib library comes with several built in styles. It is very easy to use them, and allows to improve the quality of your work. To apply a style to your plot, just add: plt.style.use(“my style”). Here is an example that…
Read more
#404 Dendrogram with heat map

#404 Dendrogram with heat map

When you use a dendrogram to display the result of a cluster analysis, it is a good practice to add the corresponding heatmap. It allows you to visualise the structure of your entities (dendrogram), and to understand if this structure…
Read more
#400 Basic Dendrogram

#400 Basic Dendrogram

      This page aims to describe how to realise a basic dendrogram with Python. To realise such a dendrogram, you first need to have a numeric matrix. Each line represent an entity (here a car). Each column is…
Read more
#390 Basic radar chart

#390 Basic radar chart

      A radar chart displays the value of several numerical variables for one or several entities. Here is an example of a simple one, displaying the values of 5 variables for only one individual. To my knowledge, there…
Read more
#313 Bubble map with Folium

#313 Bubble map with Folium

This page describes how to add bubbles to your folium map. It is done using the folium.Circle function. Each bubble has a size related to a specific value. Note that if you zoom on the map, the circle will get…
Read more
#315 A world map of #surf tweets

#315 A world map of #surf tweets

During 300 days, I harvested every tweet containing the hashtags #surf, #kitesurf and #windsurf. Here is a map showing the localisation of these tweets. This projects is explained more in detail in the blog of the R graph gallery. This…
Read more
#292 Chloropleth map with Folium

#292 Chloropleth map with Folium

Here is an example of a chloropleth map made using the Folium library. This example comes directly from the (awesome) documentation of this library. Note that you need 2 elements to build a chloropleth map. i/ A shape file in…
Read more
#372 3D PCA result

#372 3D PCA result

      3D scatterplots can be useful to display the result of a PCA, in the case you would like to display 3 principal components. Here is an example showing how to achieve it. Note that the 3 reds…
Read more
#370 3D Scatterplot

#370 3D Scatterplot

      The mplot3D toolkit of Matplotlib allows to easily create 3D scatterplots. Note that most of the customisations presented in the Scatterplot section will work in 3D as well. The result can be a bit disappointing since each…
Read more
#371 Surface plot

#371 Surface plot

3D plots are awesome to make surface plots. In a surface plot, each point is defined by 3 points: its latitude, its longitude, and its altitude (X, Y and Z). Thus, 2 types of input are possible. i/ A rectangular…
Read more
#342 Animation on 3D plot

#342 Animation on 3D plot

It is possible to create a 3D object with python. See the dedicated section. Once this is done, we can make evolute the angle of view (‘camera position’) and use each image to make an animation. Once more, the image…
Read more
#111 Custom correlogram

#111 Custom correlogram

The graph #110 showed how to make a basic correlogram with seaborn. This post aims to explain how to improve it. It is divided in 2 parts: how to custom the correlation observation (for each pair of numeric variable), and…
Read more
#341 Python Gapminder Animation

#341 Python Gapminder Animation

The TED talk of Hans Rosling on ‘Developing Country’ is probably one of the most famous dataviz ever. It shows the evolution of life Expectancy and GDP per capita of several countries through an animation. (Data found in the gapminder…
Read more
#104 Seaborn Themes

#104 Seaborn Themes

The Seaborn python library is well known for its grey background and its general styling. However, note that a few other built in style are available: darkgrid, white grid, dark, white and ticks. Here is how to call them:
Read more
#286 Boundaries provided in Basemap

#286 Boundaries provided in Basemap

The basemap library (closely linked with matplotlib) contains a database with several boundaries. Thus, it is easy to represent the limits of countries, states and counties, without having to load a shape file. Here is how to show these 3…
Read more
#282 Custom appearance of basemap

#282 Custom appearance of basemap

The graph #281 shows how to draw a basic map with basemap. Here, I show how to improve its appearance. Each element that you draw on the map (boundary, continent, coast lines…) can be customised. Moreover, note that it is…
Read more
#281 Basic map with basemap

#281 Basic map with basemap

Here is the most basic map you can do with the basemap library of python. It allows to understand the basic use of this library. Always start by initialising the map with the Basemap() function. Then, add the elements your…
Read more
#288 Map background with folium

#288 Map background with folium

Folium is a python library allowing to call the Leaflet.js Javascript library. It allows you to manipulate your data with python and map them using the power of leaflet! It is really easy to call a map using this library.…
Read more
#270 Basic Bubble plot

#270 Basic Bubble plot

        A bubble plot is very close to a scatterplot. With Matplotlib, we will construct them using the same scatter function. However, we will use the ‘s‘ argument to map a third numerical variable to the size…
Read more
#242 Area chart and faceting

#242 Area chart and faceting

I believe that area charts are especially useful when used through faceting. It allows to quickly find out the different patterns existing in the data. This example relies on a pandas data frame. We have 2 numerical variables (year and…
Read more
#241 Improve area chart

#241 Improve area chart

The chart #240 describes how to make a basic area chart. This page aims to describe a few customisation you can apply to your area chart. It is highly advised to change the default colour of matplotlib. Moreover, it is…
Read more
#240 basic area chart

#240 basic area chart

        An area chart can easily be done using python and matplotlib. Note that there are 2 main functions allowing to draw them. I advise to use the fill_between function that allows easier customisation. The stackplot function…
Read more
#173 Elaborated Venn diagram

#173 Elaborated Venn diagram

          This plot comes from the Matplotlib documentation. I think it describes very well the possibilities offered by the matplotlib library in term of Venn diagram. Thanks to the Matplotlib team for providing such a tool…
Read more
#182 Vertical lollipop plot

#182 Vertical lollipop plot

        When a lollipop plot shows the relationship between a numerical and a categorical variable. I find it better to order the groups on a decreasing order and represent them vertically. Here is how the code works,…
Read more
#181 Custom lollipop plot

#181 Custom lollipop plot

The graph #180 explains how to make a lollipop plot with Matplotlib, whatever the shape of your data. This page aims to describe the customization you can apply to the 3 main parts: the stem, the markers and the baseline. Note…
Read more
#180 Basic lollipop plot

#180 Basic lollipop plot

      This page aims to explain the basic tricks allowing to realize a lollipop plot with matplotlib. Here is a first example with 2 numerical variables, one for each axis. A lollipop plot can be created 1: using…
Read more
#160 Basic donut plot

#160 Basic donut plot

        Here is a basic donut (?doughnut) plot made using the matplotlib library. The trick here is to make a pieplot and add a white circle in the middle of it. Note that another option would be…
Read more
#122 Multiple lines chart

#122 Multiple lines chart

Graphics #120 and #121 show you how to create a basic line chart and how to apply basic customization. This posts explains how to make a line chart with several lines. Each line represents a set of values, for example…
Read more
#54 Grouped violinplot

#54 Grouped violinplot

          If you have one numerical variable, several groups, and subgroups, you probably need to make a grouped violinplot. Note that you can use faceting as well to solve this kind of dataset.      …
Read more
#91 Customize seaborn heatmap

#91 Customize seaborn heatmap

The graph #90 explains how to make a heatmap from 3 different input formats. In this post, I describe how to customize the appearance of these heatmaps. These 4 examples start by importing libraries and making a data frame:
Read more
#41 Control marker features

#41 Control marker features

Once you understood how to plot a basic scatterplot with seaborn, you probably want to custom the appearance of your markers. You can custom color, transparency, shape and size. Here is how to do it: Control shape List of available…
Read more
#34 Grouped Boxplot

#34 Grouped Boxplot

        Grouped boxplot are used when you have a numerical variable, several groups and subgroups. It is easy to realize one using seaborn.Y is your numerical variable, x is the group column, and hue is the subgroup…
Read more
#20 Basic histogram | Seaborn

#20 Basic histogram | Seaborn

With Seaborn, histograms are made using the distplot function. You can call the function with default values (left), what already gives a nice chart. Do not forget to play with the number of bins using the ‘bins’ argument. It is…
Read more