Normality makes sense

Editorial Staff

28 february 2019

It is one of the first concepts that you will learn in almost any introductory statistics course: the normal distribution. In more advanced courses, the concept of normality becomes so “normal” that the distribution itself is rarely looked at in depth. Why do so many things follow a normal distribution?

Properties of the normal distribution

The normal distribution is a mathematical creature also referred to as a “bell curve” because of its shape. All data that is normally distributed can be converted to a standard normal distribution, represented by the picture below. The x-axis gives different values of the data of the given distribution, and the y-axis represents the probability of a variable actually taking on that value. The mean value or average is in the middle. To the left and to the right are values that deviate from this average, expressed in standard deviations.

Normally distributed datasets have two nice properties. Firstly, they are easily predictable: as the picture shows, about two thirds of all values will fall within one standard deviation of the mean, with 95% falling within two standard deviations, and over 99% falling within three standard deviations. The second nice property of normal datasets is that they are symmetrical around the mean. Many situations in the real world can be modelled by a normal distribution, or at least come very close to a normal distribution. In fact, it tends to be the “go-to” distribution for most purposes. Some examples are the heights of a random population of people, an IQ distribution or the pattern of misses that a shooter makes around a bullseye.

Getting back to the original question, why is it that so many real-world data distributions take this form? The usual explanation is given by another name for the normal distribution, which is the “error distribution”. The idea is that errors are generally random, so that they are as likely to go in one direction as in the other. For the example with the bullseye, the shooter is as likely to shoot a bit to the left, as a bit to the right, or a bit high as a bit low. Thus, a graph of how far the shots are from the bullseye will reflect this random tendency, and be symmetrical around the mean. Similarly with height and intelligence – many genes contribute to these outcomes, as do a great number of environmental factors, such as nutrition, illnesses, low income and so forth.
As for the “bell shape” of the curve, that can be explained by a little experiment, simulating tossing a coin 16 times in Excel and counting the number of heads.

You can see how the graph becomes more and more like the classic bell shaped curve as the number of simulated trials goes from 40 to 4000, approximating the normal distribution. This phenomenon is explained by the Central Limit Theorem. The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger. This holds even if the original variables themselves are not normally distributed. That is why, in many real-world problems, the assumption of normality indeed does make sense.

This article is written by Sjors Keet