# Two sample $t$ test: sampling distribution of the $t$ statistic

Definition of the sampling distribution of the $t$ statistic

## Sampling distribution of $t$:

As you may know, when we perform a two sample $t$ test (not assuming equal population variances), we compute the $t$ statistic $$t = \dfrac{\bar{y}_1 - \bar{y}_2}{\sqrt{\dfrac{s^2_1}{n_1} + \dfrac{s^2_2}{n_2}}}$$ based on our group 1 and group 2 samples. Now suppose that we would draw many more samples. Specifically, suppose that we would draw an infinite number of group 1 and group 2 samples, each time of size $n_1$ and $n_2$. Each time we have a group 1 and group 2 sample, we could compute the $t$ statistic $t = \frac{\bar{y}_1 - \bar{y}_2}{\sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}}}$. Different samples will give different $t$ values. The distribution of all these $t$ values is the sampling distribution of $t$. Note that this sampling distribution is purely hypothetical. We will never really draw an infinite number of samples, but hypothetically, we could.

## Sampling distribution of $t$ if H0 were true:

Suppose that the assumptions of the two sample $t$ test hold, and that the null hypothesis that $\mu_1 = \mu_2$ is true. Then the sampling distribution of $t$ is approximately the $t$ distribution with $k$ degrees of freedom (see the overview overview for possible values of $k$). That is, most of the time we would find $t$ values close to 0, and only sometimes we would find $t$ values further away from 0. If we find a $t$ value in our actual sample that is far away from 0, this is a rare event if the null hypothesis were true, and is therefore considered evidence against the null hypothesis ($t$ value in rejection region, small $p$ value).