Anscombe’s quartet

Karthik
2 min readJul 5, 2020

Anacombe’s Quartet was developed by statistician Francis Anscombe.It consists of 4 datasets that have nearly identical descriptive statistics,yet have very different distributions and appear very different when visualized. Anacombe’s Quartet was developed to highlight the importance of data visualization. Each data set consists of 11 (x,y) points as shown below.

Data Sets
Descriptive Statistics for the datasets

Lets now see what the visualization of the data sets above looks like.

The first scatter plot is a plot for first data set , it appears to have a simple linear relationship.

The second scatter plot is a plot for the second data set , it is evident that the data set is not linear. Perhaps, using a polynomial function will serve as a good fit.

The third plot is a plot for the third data set , The linear curve is influenced by a single outlier .The calculated regression is offset by the one outlier.

In the fourth plot , the variables don’t seem to have any relationship.However , the regression line is heavily influenced by one single data point , and is enough to produce high correlation coefficient.

This quartet accentuates the importance of visualization in Data Analysis. Having a visual representation of data helps in understanding the data much better.

--

--