**Boxplot** is probably one of the most common type of graphic. It gives a nice **summary** of one or several **numeric variables**. The line that divides the box into 2 parts represents the **median**

of the data. The end of the box shows the upper and lower **quartiles**. The extreme lines shows the highest and lowest value excluding **outliers**. Note that boxplot hide the number of values

existing behind the variable. Thus, it is highly advised to print the number of observation, add unique observation with jitter or use a violinplot if you have many observations.

**Input format**

*Format 1*: 1 numerical variable (for the Y axis) + 1 categorical (gives the groups). This is the ‘**long**‘ or ‘**tidy**‘ format.

*Format 2*: several numerical variables : one per group. This is the ‘**wide**‘ format.

## Sponsors

**Seaborn**

**Boxplot and hidden data**

A boxplot **summarizes** the distribution of a numerical variable for one or several groups. Thus, it hides the underlying distribution and the number of points of each group. That makes this chart dangerous. This post gives an example of possible mistake, and 3 solutions to fix it.

**related**