$z$ test for the difference between two proportions: sampling distribution of the $z$ statistic

Definition of the sampling distribution of the $z$ statistic


Sampling distribution of $ z$:

As you may know, when we perform a $ z$ test for the difference between two proportions, we compute the $ z$ statistic $$ z = \dfrac{p_1 - p_2}{\sqrt{p(1 - p)\Bigg(\dfrac{1}{n_1} + \dfrac{1}{n_2}\Bigg)}} $$ based on our group 1 and group 2 samples. Now suppose that we drew many more samples. Specifically, suppose that we drew an infinite number of group 1 and group 2 samples, each time of size $ n_1$ and $ n_2$. Each time we have a group 1 and group 2 sample, we could compute the $ z$ statistic $ z = \frac{p_1 - p_2}{\sqrt{p(1 - p)\Bigg(\frac{1}{n_1} + \frac{1}{n_2}\Bigg)}}$. Different samples would give different $ z$ values. The distribution of all these $ z$ values is the sampling distribution of $ z$. Note that this sampling distribution is purely hypothetical. We will never really draw an infinite number of samples, but hypothetically, we could.

Sampling distribution of $ z$ if H0 were true:

Suppose that the assumptions of the $ z$ test for the difference between two proportions hold, and that the null hypothesis that $\pi_1 = \pi_2$ is true. Then the sampling distribution of $ z$ is approximately normal with mean 0 and standard deviation 1 (standard normal). That is, most of the time we would find $ z$ values close to 0, and only sometimes we would find $ z$ values further away from 0. If we find a $ z$ value in our actual sample that is far away from 0, this is a rare event if the null hypothesis were true, and is therefore considered evidence against the null hypothesis ($ z$ value in rejection region, small $ p$ value).

Standard normal distribution