One way ANOVA: sampling distribution of $F$ and of $t$

Definition of the sampling distribution of the $F$ statistic in one way ANOVA, and of the $t$ statistic computed in follow up tests (contrasts/multiple comparisons)

Sampling distribution of $ F$:

As you may know, when we perform a one way ANOVA, we compute the $ F$ statistic $$ F = \dfrac{\mbox{mean square between}}{\mbox{mean square error}} $$ based on our samples from the $ I$ populations. Now suppose that we would draw many more samples. Specifically, suppose that we would repeat our study an infinite number of times. In each of the studies, we could compute the $ F$ statistic $ F = \frac{\mbox{mean square between}}{\mbox{mean square error}}$ based on the sampled data. Different studies would be based on different samples, resulting in different $ F$ values. The distribution of all these $ F$ values is the sampling distribution of $ F$. Note that this sampling distribution is purely hypothetical. We will never really repeat our study an infite number of times, but hypothetically, we could.

Sampling distribution of $ F$ if H0 were true:

Suppose that the assumptions of the ANOVA hold, and that the null hypothesis that $\mu_1 = \mu_2 = \ldots = \mu_I$ is true. Then the sampling distribution of $ F$ is the $ F$ distribution with $ I - 1$ and $ N - I$ degrees of freedom. That is, most of the time we would find relatively small $ F$ values, and only sometimes we would find large $ F$ values. If we find an $ F$ value in our actual study that is very large, this is a rare event if the null hypothesis were true, and is therefore considered evidence against the null hypothesis ($ F$ value in rejection region, small $ p$ value).

F distribution

Sampling distribution of $ t$:

In addition to the ANOVA $ F$ test, we may also want to perform $ t$ tests for contrasts or multiple comparisons:

$ t$ statistic for contrast:
  • $ t = \dfrac{c}{s_p\sqrt{\sum \dfrac{a^2_i}{n_i}}}$
$ t$ statistic multiple comparisons:
  • $ t = \dfrac{\bar{y}_g - \bar{y}_h}{s_p\sqrt{\dfrac{1}{n_g} + \dfrac{1}{n_h}}}$
based on our samples from the $ I$ populations. Again, suppose that we would draw many more samples. Specifically, suppose that we would repeat our study an infinite number of times. In each of the studies, we could compute the $ t$ statistic based on the sampled data. Different studies would be based on different samples, resulting in different $ t$ values. The distribution of all these $ t$ values is the sampling distribution of $ t$. Note that this sampling distribution is purely hypothetical. We will never really repeat our study an infite number of times, but hypothetically, we could.

Sampling distribution of $ t$ if H0 were true:

Suppose that the assumptions of the ANOVA hold, and that the null hypothesis tested by the $ t$ test is true (the population contrast $\Psi = 0$, or $\mu_g = \mu_h$). Then the sampling distribution of $ t$ is the $ t$ distribution with $ N - I$ degrees of freedom. That is, most of the time we would find $ t$ values close to 0, and only sometimes we would find $ t$ values further away from 0. If we find a $ t$ value in our actual study that is far away from 0, this is a rare event if the null hypothesis were true, and is therefore considered evidence against the null hypothesis ($ t$ value in rejection region, small $ p$ value).

t distribution