# Regression analysis: sampling distribution of $F$ and of $t$

Definition of the sampling distribution of the $F$ statistic and $t$ statistic in regression analysis

## Sampling distribution of $F$:

As you may know, when we perform an OLS regression analysis, we compute the $F$ statistic $$F = \dfrac{\mbox{mean square model}}{\mbox{mean square error}}$$ based on our sample data. Now suppose that we would draw many more samples. Specifically, suppose that we would repeat our study an infinite number of times. In each of the studies, we could perform the same regression analysis and compute the $F$ statistic $F = \frac{\mbox{mean square model}}{\mbox{mean square error}}$ based on the sampled data. Different studies would be based on different samples, resulting in different $F$ values. The distribution of all these $F$ values is the sampling distribution of $F$. Note that this sampling distribution is purely hypothetical. We will never really repeat our study an infite number of times, but hypothetically, we could.

## Sampling distribution of $F$ if H0 were true:

Suppose that the assumptions of the regression analysis hold, and that the null hypothesis that $\beta_1 = \beta_2 = \ldots = \beta_K = 0$ is true. Then the sampling distribution of $F$ is the $F$ distribution with $K$ and $N - K - 1$ degrees of freedom. That is, most of the time we would find relatively small $F$ values, and only sometimes we would find large $F$ values. If we find an $F$ value in our actual study that is very large, this is a rare event if the null hypothesis were true, and is therefore considered evidence against the null hypothesis ($F$ value in rejection region, small $p$ value).

## Sampling distribution of $t$:

In addition to the $F$ test for the complete regression model, we can also perform a $t$ test for an individual regression coeficient: $$t = \dfrac{b_k}{SE_{b_k}}$$ based on our sample data. Again, suppose that we would draw many more samples. Specifically, suppose that we would repeat our study an infinite number of times. In each of the studies, we could perform the same regression analysis and compute the $t$ statistic $t = \frac{b_k}{SE_{b_k}}$. Different studies would be based on different samples, resulting in different $t$ values. The distribution of all these $t$ values is the sampling distribution of $t$. Note that this sampling distribution is purely hypothetical. We will never really repeat our study an infite number of times, but hypothetically, we could.

## Sampling distribution of $t$ if H0 were true:

Suppose that the assumptions of the regression analysis hold, and that the null hypothesis that $\beta_k = 0$ is true. Then the sampling distribution of $t$ is the $t$ distribution with $N - K - 1$ degrees of freedom. That is, most of the time we would find $t$ values close to 0, and only sometimes we would find $t$ values further away from 0. If we find a $t$ value in our actual study that is far away from 0, this is a rare event if the null hypothesis were true, and is therefore considered evidence against the null hypothesis ($t$ value in rejection region, small $p$ value).