Boxplot is probably one of the most common type of graphic. It gives a nice summary of one or several numeric variables. The line that divides the box into 2 parts represents the median
of the data. The end of the box shows the upper and lower quartiles. The extreme lines shows the highest and lowest value excluding outliers. Note that boxplot hide the number of values
Format 1: 1 numerical variable (for the Y axis) + 1 categorical (gives the groups). This is the ‘long‘ or ‘tidy‘ format.
Format 2: several numerical variables : one per group. This is the ‘wide‘ format.
Boxplot and hidden data
A boxplot summarizes the distribution of a numerical variable for one or several groups. Thus, it hides the underlying distribution and the number of points of each group. That makes this chart dangerous. This post gives an example of possible mistake, and 3 solutions to fix it.