Appendix B — Some Useful Probability Distributions

B.1 Introduction

The following is a short overview of some of the most common probability distributions encountered in the context of linear regression.

B.2 Normal distribution

The normal or Gaussian distribution is of principal importance in probability and statistics. In its univariate form, we say that \(X\sim\mathcal{N}\left(\mu,\sigma^2\right)\) with mean \(\mu\in\mathbb{R}\) and variance \(\sigma^2\in\mathbb{R}^+\) if it has the following probability density function (pdf): \[ f_{\mathcal{N}\left(\mu,\sigma^2\right)}(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left( -\frac{1}{2\sigma^2}(x-\mu)^2 \right). \] Such a random variable \(X\) can be centred and scaled into a standardized form as \((X-\mu)/\sigma\sim\mathcal{N}\left(0,1\right)\). Furthermore, the normal distribution is very stable in the sense that the sum of a collection of independent normal random variables has a normal distribution as well as scaling a normal random variable by some real valued scalar.

Part of its importance stems from the central limit theorem, which in its simplest form states that the standardized sum of a collection on independent random variables converges in distribution to a standard normal random variable. However, its worth noting that there are many other central limit theorems in existence.

Theorem B.1 (Central Limit Theorem) Let \(Y_1,\ldots,Y_n\) be a sample of \(n\) iid random variables with mean \(\mu\in\mathbb{R}\) and finite variance \(\sigma^2<\infty\). Letting \(\bar{Y} = n^{-1}\sum_{i=1}^n Y_i\) be the sample mean, then \[ \sqrt{n}( \bar{Y}-\mu ) \xrightarrow{\text{d}} \mathcal{N}\left(0,\sigma^2\right) \] where \( \xrightarrow{\text{d}} \) denotes convergence in distribution.

The normal distribution can be extended to the multivariate normal distribution for a vector \(X\in\mathbb{R}^p\) where we write \(X\sim\mathcal{N}\left({\bf \mu},\Sigma\right)\). Here, \({\bf\mu}\in\mathbb{R}^p\) is the mean vector while \(\Sigma\in\mathbb{R}^{p\times p}\) is a \(p\times p\) matrix that is symmetric and positive semi-definite. This distribution also has elliptical symmetry. The form of the pdf, assuming \(\Sigma\) is positive definite, is \[ f_{\mathcal{N}\left(\mu,\Sigma\right)}(x) = \frac{1}{(2\pi)^{p/2}\det(\Sigma)^{1/2}}\exp\left( -\frac{1}{2}{(x-\mu)}^\mathrm{T}\Sigma^{-1}(x-\mu) \right) \] Otherwise, \(X\) will have a degenerate normal distribution, which is still normal but supported on a subspace of dimension less than \(p\).

The multivariate normal (MVN) distribution has some very nice characterizations. A vector \(X=(X_1,\ldots,X_p)\) is MVN if and only if every linear combination of the \(X_i\) is univariate normal. That is, for all \(a\in\mathbb{R}^p\), \(a\cdot X \sim\mathcal{N}\left(\tilde{\mu},\tilde{\sigma}^2\right)\). Hence, it is worth emphasizing that if \(X\) is a vector that is marginally normal (i.e. every component \(X_i\) is univariate normal), then the joint distribution is not necessarily multivariate normal.

In general if two random variables \(X,Y\in\mathbb{R}\) are independent, then we have that \(\mathrm{cov}\left(X,Y\right)=0\). However, the reverse implication is not necessarily true. One exception to this is that if \((X,Y)\) is MVN then if \(\mathrm{cov}\left(X,Y\right)=0\) then \(X\) and \(Y\) are independent.

B.3 Chi-Squared distribution

The chi-squared distribution arises throughout the field of statistics usually in the context of goodness-of-fit testing. Let \(Z\sim\mathcal{N}\left(0,1\right)\), then \(Z\sim\chi^2\left(1\right)\), which is said to be chi-squared with one degree of freedom. Furthermore, chi-squared random variables are additive in the sense that if \(X\sim\chi^2\left(\nu\right)\) and \(Y\sim\chi^2\left(\eta\right)\) are independent, then \(X+Y\sim\chi^2\left(\nu+\eta\right)\).
A chi-squared random variable is supported on the positive real line and the pdf for \(X\sim\chi^2\left(\nu\right)\) with \(\nu>0\) is \[ f_{\chi^2\left(\nu\right)}(x) = \frac{1}{2^{\nu/2}\Gamma(\nu/2)}x^{\nu/2-1}\mathrm{e}^{-x/2} \] Its mean and variance are \(\nu\) and \(2\nu\), respectively. Note that while very often the degrees of freedom parameter is a positive integer, it can take on any positive real value and still be valid. In fact, the chi-squared distribution is a specific type of the more general gamma distribution, which will not be discussed in these notes.

This random variable is quite useful in the context of linear regression with respect to how it leads to the t and F distributions discussed in the following subsections. However, it also arises in many other areas of statistics including Wilks’ Theorem regarding the asymptotic distribution of the log likelihood ratio, goodness-of-fit in multinomial hypothesis testing, and testing for independence in a contingency table.

B.4 t distribution

The t distribution is sometimes referred to as Student’s t distribution in recognition of its, at the time, anonymous progenitor William Sealy Gosset working at the Guinness brewery. It most notably arises in the context of estimation of a normal random variable when the variance is unknown as occurs frequently in these lecture notes.

Let \(Z\sim\mathcal{N}\left(0,1\right)\) and let \(V \sim\chi^2\left(\nu\right)\) be independent random variables. Then, we say that \[ T = \frac{Z}{\sqrt{V/\nu}} \sim t\left(\nu\right) \] has a t distribution with \(\nu\) degrees of freedom. Such a distribution can be thought of as a heavier tailed version of the standard normal distribution. In fact, \(t\left(1\right)\) is the Cauchy distribution with pdf \[ f_{t\left(1\right)}(x) = (\pi(1+x^2))^{-1} \] while \(T\sim t\left(\nu\right)\) converges to a standard normal distribution as \(\nu\rightarrow\infty\).

A noteworthy property of a t distributed random variable is that it will only have moments up to but not including order \(\nu\).
That is, for \(T\sim t\left(\nu\right)\), \(\mathrm{E}T^k < \infty\) for \(k<\nu\).
For \(k\ge\nu\), the moments do not exist.

B.5 F distribution

The F distribution arises often in the context of linear regression when a comparison is made between two sources of variation. Let \(X\sim\chi^2\left(\nu\right)\) and \(Y\sim\chi^2\left(\eta\right)\) be independent random variables, then we write that \[ F = \frac{X/\nu}{Y/\eta} \sim F\left(\nu,\eta\right) \] has an F distribution with degrees of freedom \(\nu\) and \(\eta\). Due to this form of the distribution, if \(F\sim F\left(\nu,\eta\right)\), then \(F^{-1}\sim F\left(\eta,\nu\right)\). The F distribution is supported on the positive real line. The F and t distributions are related by the fact that if \(T\sim t\left(\nu\right)\), then \(T^2\sim F\left(1,\nu\right)\).