The normal distribution, also called the Gaussian distribution, is a probability distribution commonly used to model phenomena such as physical characteristics (e.g. height, weight, etc.) and test scores. Due to its shape, it is often referred to as the bell curve:

Normal Distribution | Brilliant Math & Science Wiki (1) The graph of a normal distribution with mean of \(0\) and standard deviation of \(1\)

Owing largely to the central limit theorem, the normal distributions is an appropriate approximation even when the underlying distribution is known to be not normal. This is convenient because the normal distribution is easy to obtain estimates with; the empirical rule states that 68% of the data modeled by a normal distribution falls within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. For obvious reasons, the empirical rule is also occasionally known as the 68-95-99.7 rule.

In addition, the normal distribution exhibits a number of nice simplifying characteristics, many of which may be observed from the above plot. It is symmetric and single-peaked, implying that its mean, median, and mode are all equal. It additionally has "skinny tails", intuitively meaning it "tapers off" quickly and formally means it has a kurtosis of 0.

In this context, fat-tailed would mean having large skewness or kurtosis.


Normality and the Central Limit Theorem

Many physical phenomena, like height and weight, closely follow a normal distribution. This is somewhat counterintuitive on first glance since normal distributions are positive everywhere but it is clearly impossible to have a negative height, but normal distributions have skinny enough tails that these probabilities are negligible.

Intuitively, the normal distribution is "nice" enough that we expect it to occur naturally unless there is a good reason to believe otherwise. This intuition is formalized by the central limit theorem, which states the following:

The probability distribution of the average of \(n\) independent, identically distributed (iid) random variables converges to the normal distribution for large \(n.\)

In fact, \(n = 30\) is typically enough to observe convergence. Intuitively, this means that characteristics that can be represented as combinations of independent factors are well-represented by a normal distribution. For instance, if we flip a coin many times, the number of heads can be viewed as the sum of many iid random variables and thus would be well-represented by a bell curve:

Normal Distribution | Brilliant Math & Science Wiki (2) The binomial distribution with 30 coinflips. This already looks a lot like a bell curve!

Many natural phenomena may also be modeled in this way. For example, the accuracy of measurement instruments (e.g. telescopes) may be viewed as a combination of the manufacturing efficacy of many independent parts, and thus is a good candidate for being modeled via a normal distribution.

The normal distribution is particularly useful in sampling, as the central limit theorem also implies that the distribution of averages of simple random samples is normal. For instance, if we polled many voters on whether they liked (value of 1) or disliked (value of 0) a politician, so long as the voters are independent, the politician's approval rating would be distributed normally regardless of the voters' opinion of them (their opinion would influence the mean and variance of the distribution, but not its shape). This is useful for pollsters, as calculating "margins of error" can be done relatively easily using the empirical rule in the next section.

It is worth noting that not all phenomena are well-modeled by a normal distribution. Even if a phenomenon may be represented as the combination of many factors, if one of those factors outweighs the others, then the distribution will often not be normal.

Student scores on history quizzes are likely to be non-normal since their performance is dominated by whether or not they read the material before class. The distribution is likely to be left-skewed.

Similarly, if the factors are not independent—e.g. if the voters in the above example could hear each others' responses before answering—then normality often breaks down as well.

The 2008 financial crisis was arguably caused by long-term adherence to the assumption that stock prices are normal when, in fact, there is often a herd mentality contributing to swift rises/falls in price. Dependencies among contributing factors lead to distributions with fatter tails than the normal distribution.

In general, these are good rules of thumb to determine whether the normality assumption is appropriate:

Amalgamation of similar distributions\(\hspace{15mm}\)Dominated by one (or few) particular distribution
Contributing factors are independentDependencies among contributing factors
Sample selection is uniformly randomSample selection is correlated to previous selection

More formally, there are several statistical tests, most notably Pearson's chi-squared test, to determine whether the normality assumption is valid.

Empirical Rule

The empirical rule, or the 68-95-99.7 rule, states that 68% of the data modeled by a normal distribution falls within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. For example, IQ is designed to have a mean of 100 and a standard deviation of 15, meaning that 68% of people have IQs between \(100 - 15 = 85\) and \(100 + 15 = 115\), 95% of people have IQs between 70 and 130, and 99.7% of people have IQs between 55 and 145.

This makes the normal distribution easy to obtain quick estimates from, which is especially useful for polling purposes as the margin of error may simply be reported as \(\pm 2\) standard deviations (so, for instance, a candidate's approval rating might be 70% \(\pm\) 3%). For more exact and general calculations, we utilize a \(z\)-score:

The \(z\)-score of an observation is the number of standard deviations away from the mean it is. Formally, if \(\sigma\) is the standard deviation of the distribution, \(\mu\) is the mean of the distribution, and \(x\) is the value, then

\[z = \frac{x - \mu}{\sigma}.\]

For instance, the \(z\)-score of a 121 IQ score is \(\frac{121 - 100}{15} = 1.4\). This value is used in many tests in statistics, most commonly the \(z\)-test. By calculating the area under the bell curve, a \(z\)-score provides the probability of a random variable with this distribution having a value less than the \(z\)-score.

Normal Distribution | Brilliant Math & Science Wiki (3) A visual representation of the \(z\)-score \(-0.68\)

A \(z\)-score table usually takes the following form, where the column determines the hundredths digit of the \(z\)-score and the row determines the tenths and units digit.


Note that the \(z\)-table aligns with the empirical rule. Reading off the table, about \(0.1587\) of the data falls below -1 standard deviation from the mean, and about \(0.8413\) of the data falls below 1 standard deviation from the mean. As a result, about \(0.8413 - 0.1587 = 0.6826 \approx 68\%\) of the data falls between -1 and 1 standard deviations.

Consider a population with a normal distribution that has mean \(3\) and standard deviation \(4\). What is the probability that a value selected at random will be negative? What about positive?

A negative number is any number less than \(0\), so the first step is to find the \(z\)-score associated to \(0\). That is \(\frac{0 - 3}{4} = -0.75\). By finding the row with the first two digits \((-0.7)\) of the \(z\)-score and choosing the column with the next digit \((5),\) we find that the value in the table associated to a value of \(-0.75\) is \(0.2266\), so there is a \(\color{red} \text{22.66%} \) probability that the value will be negative. There is a \(1 - 0.2266 = 0.7734\) or 77.34% probability of it being positive. \(_\square\)

Note that the area under the curve can be computed using integral calculus, so long as the probability density function is known. In particular, if this function is \(f(x)\) and we look at a "standard" normal distribution (i.e. mean 0 and standard deviation 1), then the \(z\)-table entry for a \(z\)-score of \(z\) can be expressed as \(\int_{-\infty}^{z}f(x)\). For instance, the empirical rule can be summarized by

\[\int_{-1}^1 f(x) \approx 68\%,\quad \int_{-2}^2 f(x) \approx 95\%,\quad \int_{-3}^3 f(x) \approx 99.7\%.\]

We will see how to determine \(f(x)\) later.


The normal distribution has two important properties that make it special as a probability distribution.

The average of \(n\) normal distributions is normal, regardless of \(n\).

There exist other distributions that have this property, and they are called stable distributions. However, the normal distribution is the only stable distribution that is symmetric and has finite variance. Such sums are known as multivariate normal distributions.

Given a simple random sample from a random variable with a normal distribution, the sample mean and sample variance are independent.

This property is unique (among all probability distributions) to the normal distribution. It emphasizes the overall symmetry and "balance" of the bell curve.

Histograms show how samples of a normally distributed random variable approach a bell curve as the sample size increases. The following graphs are of samplings of a random variable with normal distribution of mean \(0\) and standard deviation \(1\).

Normal Distribution | Brilliant Math & Science Wiki (4) \(n = 10\) Normal Distribution | Brilliant Math & Science Wiki (5) \(n = 100\)
Normal Distribution | Brilliant Math & Science Wiki (6) \(n = 1000\) Normal Distribution | Brilliant Math & Science Wiki (7) \(n = 10000\)
Normal Distribution | Brilliant Math & Science Wiki (8) \(n = 100000\) Normal Distribution | Brilliant Math & Science Wiki (9) \(n = 1000000\)

Note how the graphs become more and more symmetric as \(n\) increases. The proportion of numbers in a certain region also begins to have a fixed ratio. For instance, as the empirical rule suggests, \(68\%\) of the numbers in the last graph appear between \(-1\) and \(1\). In fact, all normal distributions have these same ratios, and tables of \(z\)-scores are used to determine the exact proportions.

A new product was released and a survey asked customers to give the product a score between 1 and 100. At first, when the number of subjects \((n)\) was still relatively low, the company couldn't pull much information from the surveys. For example, after four people had taken the survey, one person rated it a 92, one rated it a 72, one rated it a 63, and the last one rated it a 34. However, as more customers took the survey, the company was able to create a histogram showing the results. Once 5,000 surveys had been taken, the company found that the average person rated the product a 67 out of 100, and the rest of the scores were normally distributed in a bell-curve out from there (with a standard deviation of 9). Based on this, the company decided that its product was not meeting customers' desires.

Formal Definition and Derivation

The normal distribution with mean \(\mu\) and variance \(\sigma^2\) is denoted \(\mathcal{N}\big(\mu, \sigma^2\big)\). Its probability density function is

\[p_{\mu, \sigma^2} (x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}.\]

There is no closed form expression for the cumulative density function.

If \(X_1\) and \(X_2\) are independent normal random variables, with \(X_1 \sim \mathcal{N}\big(\mu_1, \sigma_1^2\big)\) and \(X_2 \sim \mathcal{N}\big(\mu_2, \sigma_2^2\big)\), then \(aX_1 \pm bX_2 \sim \mathcal{N}\big(a\mu_1 \pm b\mu_2, a^2\sigma_1^2 + b^2\sigma_2^2\big)\).

The bell curve is a probability density curve of binary systems. Then the probability at some displacement from the medium is

\[P(n, k) = \left( \begin{matrix} n \\ k \end{matrix} \right) {2}^{-n}= \frac{n!}{\big(\frac{1}{2}n + k\big)!\, \big(\frac{1}{2}n - k\big)!\, {2}^{n}}.\]

Using the Stirling's approximation and treating \(k = \frac{\sigma}{2}\), we have

\[P(n, \sigma) \sim {\left(\frac{n}{2\pi} \right)}^{\frac{1}{2}} {\left(\frac{n}{2}\right)}^{n} {\left(\frac{{n}^{2} - {\sigma}^{2}}{4}\right)}^{-\frac{1}{2}(n+1)}{\left(\frac{n + \sigma}{n-\sigma}\right)}^{\frac{-\sigma}{2}}.\]

For \(n\gg \sigma \), \(\frac{n + \sigma}{n-\sigma} \sim 1+\frac{2\sigma}{n}\); hence, for large \(n\)

\[P(n, \sigma) \sim {\left(\frac{n}{2\pi} \right)}^{\frac{1}{2}} {\left(1- \frac{{\sigma}^{2}}{{n}^{2}}\right)}^{-\frac{1}{2}(n+1)}{\left(1+\frac{2\sigma}{n}\right)}^{\frac{-\sigma}{2}}.\]

Taking the logarithm yields

\[\ln\big(P(n,\sigma)\big) \sim \frac{1}{2}\ln \left (\frac{2}{\pi n}\right) - \frac{1}{2}(n+1)\ln \left (1- \frac{{\sigma}^{2}}{{n}^{2}}\right) - \frac{\sigma}{2}\ln \left (1+\frac{2\sigma}{n}\right).\]

For small \(x\), \(\ln(1+x) \approx x\); subsequently,

\[\ln\big(P(n,\sigma)\big) \sim \frac{1}{2}\ln \left (\frac{2}{\pi n}\right) - \frac{1}{2}(n+1) \left (-\frac{{\sigma}^{2}}{{n}^{2}}\right) - \frac{\sigma}{2} \left (\frac{2\sigma}{n}\right)\]


\[\ln\big(P(n,\sigma)\big) \sim \frac{1}{2}\ln \left (\frac{2}{\pi n}\right) + \frac{{\sigma}^{2}}{{n}^{2}} - \frac{{\sigma}^{2}}{2n}.\]

Since \(\frac{{\sigma}^{2}}{{n}^{2}}\) vanishes faster than \(\frac{{\sigma}^{2}}{2n}\) for very large \(n\), we arrive at the result

\[P(n, \sigma) = {\left(\frac{2}{\pi n} \right)}^{\frac{1}{2}} {e}^{\frac{-{\sigma}^{2}}{2n}}.\]

  • Central Limit Theorem
  • Mean
  • Simple Random Samples

What is normal distribution math and science? ›

Normal distribution, also known as the Gaussian distribution, is a probability distribution that appears as a "bell curve" when graphed. The normal distribution describes a symmetrical plot of data around its mean value, where the width of the curve is defined by the standard deviation.

What is a normal distribution in layman's terms? ›

The normal distribution is also known as a Gaussian distribution or probability bell curve. It is symmetric about the mean and indicates that values near the mean occur more frequently than the values that are farther away from the mean.

Why is normal distribution important in science? ›

Somewhat related, central limit theorem tells us that average of a samples from any distribution (no fat tails) follows normal distribution. So normal distribution is useful and provides theoretical basis for doing population level parameter estimates from samples (think of election predictions).

What is normal distribution for dummies? ›

A normal distribution is symmetrical around the mean. Normal distribution reaches its highest point at the mean. It is bell-shaped. It has a zero point at the mean and it decreases as you move away from the mean on both sides.

What is a real life example of a normal distribution? ›

A normal distribution, also called the bell curve, has many real world examples. Some examples include test scores, height, shoe size, IQ, and income.

What are the 7 properties of normal distribution? ›

Normal distributions are symmetric, unimodal, and asymptotic, and the mean, median, and mode are all equal. A normal distribution is perfectly symmetrical around its center. That is, the right side of the center is a mirror image of the left side. There is also only one mode, or peak, in a normal distribution.

Why is normal distribution so famous? ›

The normal distribution is an important probability distribution in math and statistics because many continuous data in nature and psychology display this bell-shaped curve when compiled and graphed.

What is the disadvantage of normal distribution? ›

One disadvantage of a normal distribution is that there is always some probability that a quantity is negative, even when this makes no sense for the uncertain quantity. For example, the time a light bulb lasts cannot be negative.

What is the normal distribution in math in the modern world? ›

The graph of the normal distribution is characterized by two parameters: the mean, or average, which is the maximum of the graph and about which the graph is always symmetric; and the standard deviation, which determines the amount of dispersion away from the mean.

What is another name for normal distribution? ›

In probability theory and statistics, the Normal Distribution, also called the Gaussian Distribution, is the most significant continuous probability distribution. Sometimes it is also called a bell curve.

Why do I need normal distribution? ›

Answer. The first advantage of the normal distribution is that it is symmetric and bell-shaped. This shape is useful because it can be used to describe many populations, from classroom grades to heights and weights.

What is the law of normal distribution? ›

In a normal distribution, data are symmetrically distributed with no skew. Most values cluster around a central region, with values tapering off as they go further away from the center. The measures of central tendency (mean, mode, and median) are exactly the same in a normal distribution.

What is meant by normal distribution? ›

A normal distribution is a statistical phenomenon representing a symmetric bell-shaped curve. Most values are located near the mean; also, only a few appear at the left and right tails. It follows the empirical rule or the 68-95-99.7 rule.

What is the normal distribution in math methods? ›

Overview. The normal distribution is a continuous probability distribution. It has the equation: f ( x ) = 1 σ 2 π e − 1 2 ( x − μ σ ) 2 f(x) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2} f(x)=σ2π 1e−21(σx−μ)2 The graph of a normal distribution is symmetrical and bell shaped.

What is an example of a normal distribution in data science? ›

All kinds of variables in natural and social sciences are normally or approximately normally distributed. Height, birth weight, reading ability, job satisfaction, or SAT scores are just a few examples of such variables.

What is normal distribution easy examples? ›

Normal Distribution Curve

The random variables following the normal distribution are those whose values can find any unknown value in a given range. For example, finding the height of the students in the school. Here, the distribution can consider any value, but it will be bounded in the range say, 0 to 6ft.


