# Goodness of fit test - overview

This page offers structured overviews of one or more selected methods. Add additional methods for comparisons by clicking on the dropdown button in the right-hand column. To practice with a specific method click the button at the bottom row of the table

Goodness of fit test
Chi-squared test for the relationship between two categorical variables
$z$ test for a single proportion
Independent variableIndependent /column variableIndependent variable
NoneOne categorical with $I$ independent groups ($I \geqslant 2$)None
Dependent variableDependent /row variableDependent variable
One categorical with $J$ independent groups ($J \geqslant 2$)One categorical with $J$ independent groups ($J \geqslant 2$)One categorical with 2 independent groups
Null hypothesisNull hypothesisNull hypothesis
• H0: the population proportions in each of the $J$ conditions are $\pi_1$, $\pi_2$, $\ldots$, $\pi_J$
or equivalently
• H0: the probability of drawing an observation from condition 1 is $\pi_1$, the probability of drawing an observation from condition 2 is $\pi_2$, $\ldots$, the probability of drawing an observation from condition $J$ is $\pi_J$
H0: there is no association between the row and column variable

More precisely, if there are $I$ independent random samples of size $n_i$ from each of $I$ populations, defined by the independent variable:
• H0: the distribution of the dependent variable is the same in each of the $I$ populations
If there is one random sample of size $N$ from the total population:
• H0: the row and column variables are independent
H0: $\pi = \pi_0$

Here $\pi$ is the population proportion of 'successes', and $\pi_0$ is the population proportion of successes according to the null hypothesis.
Alternative hypothesisAlternative hypothesisAlternative hypothesis
• H1: the population proportions are not all as specified under the null hypothesis
or equivalently
• H1: the probabilities of drawing an observation from each of the conditions are not all as specified under the null hypothesis
H1: there is an association between the row and column variable

More precisely, if there are $I$ independent random samples of size $n_i$ from each of $I$ populations, defined by the independent variable:
• H1: the distribution of the dependent variable is not the same in all of the $I$ populations
If there is one random sample of size $N$ from the total population:
• H1: the row and column variables are dependent
H1 two sided: $\pi \neq \pi_0$
H1 right sided: $\pi > \pi_0$
H1 left sided: $\pi < \pi_0$
AssumptionsAssumptionsAssumptions
• Sample size is large enough for $X^2$ to be approximately chi-squared distributed. Rule of thumb: all $J$ expected cell counts are 5 or more
• Sample is a simple random sample from the population. That is, observations are independent of one another
• Sample size is large enough for $X^2$ to be approximately chi-squared distributed under the null hypothesis. Rule of thumb:
• 2 $\times$ 2 table: all four expected cell counts are 5 or more
• Larger than 2 $\times$ 2 tables: average of the expected cell counts is 5 or more, smallest expected cell count is 1 or more
• There are $I$ independent simple random samples from each of $I$ populations defined by the independent variable, or there is one simple random sample from the total population
• Sample size is large enough for $z$ to be approximately normally distributed. Rule of thumb:
• Significance test: $N \times \pi_0$ and $N \times (1 - \pi_0)$ are each larger than 10
• Regular (large sample) 90%, 95%, or 99% confidence interval: number of successes and number of failures in sample are each 15 or more
• Plus four 90%, 95%, or 99% confidence interval: total sample size is 10 or more
• Sample is a simple random sample from the population. That is, observations are independent of one another
If the sample size is too small for $z$ to be approximately normally distributed, the binomial test for a single proportion should be used.
Test statisticTest statisticTest statistic
$X^2 = \sum{\frac{(\mbox{observed cell count} - \mbox{expected cell count})^2}{\mbox{expected cell count}}}$
Here the expected cell count for one cell = $N \times \pi_j$, the observed cell count is the observed sample count in that same cell, and the sum is over all $J$ cells.
$X^2 = \sum{\frac{(\mbox{observed cell count} - \mbox{expected cell count})^2}{\mbox{expected cell count}}}$
Here for each cell, the expected cell count = $\dfrac{\mbox{row total} \times \mbox{column total}}{\mbox{total sample size}}$, the observed cell count is the observed sample count in that same cell, and the sum is over all $I \times J$ cells.
$z = \dfrac{p - \pi_0}{\sqrt{\dfrac{\pi_0(1 - \pi_0)}{N}}}$
Here $p$ is the sample proportion of successes: $\dfrac{X}{N}$, $N$ is the sample size, and $\pi_0$ is the population proportion of successes according to the null hypothesis.
Sampling distribution of $X^2$ if H0 were trueSampling distribution of $X^2$ if H0 were trueSampling distribution of $z$ if H0 were true
Approximately the chi-squared distribution with $J - 1$ degrees of freedomApproximately the chi-squared distribution with $(I - 1) \times (J - 1)$ degrees of freedomApproximately the standard normal distribution
Significant?Significant?Significant?
• Check if $X^2$ observed in sample is equal to or larger than critical value $X^{2*}$ or
• Find $p$ value corresponding to observed $X^2$ and check if it is equal to or smaller than $\alpha$
• Check if $X^2$ observed in sample is equal to or larger than critical value $X^{2*}$ or
• Find $p$ value corresponding to observed $X^2$ and check if it is equal to or smaller than $\alpha$
Two sided:
Right sided:
Left sided:
n.a.n.a.Approximate $C\%$ confidence interval for $\pi$
--Regular (large sample):
• $p \pm z^* \times \sqrt{\dfrac{p(1 - p)}{N}}$
where the critical value $z^*$ is the value under the normal curve with the area $C / 100$ between $-z^*$ and $z^*$ (e.g. $z^*$ = 1.96 for a 95% confidence interval)
With plus four method:
• $p_{plus} \pm z^* \times \sqrt{\dfrac{p_{plus}(1 - p_{plus})}{N + 4}}$
where $p_{plus} = \dfrac{X + 2}{N + 4}$ and the critical value $z^*$ is the value under the normal curve with the area $C / 100$ between $-z^*$ and $z^*$ (e.g. $z^*$ = 1.96 for a 95% confidence interval)
n.a.n.a.Equivalent to
--
• When testing two sided: goodness of fit test, with a categorical variable with 2 levels.
• When $N$ is large, the $p$ value from the $z$ test for a single proportion approaches the $p$ value from the binomial test for a single proportion. The $z$ test for a single proportion is just a large sample approximation of the binomial test for a single proportion.
Example contextExample contextExample context
Is the proportion of people with a low, moderate, and high social economic status in the population different from $\pi_{low} = 0.2,$ $\pi_{moderate} = 0.6,$ and $\pi_{high} = 0.2$?Is there an association between economic class and gender? Is the distribution of economic class different between men and women?Is the proportion of smokers amongst office workers different from $\pi_0 = 0.2$? Use the normal approximation for the sampling distribution of the test statistic.
SPSSSPSSSPSS
Analyze > Nonparametric Tests > Legacy Dialogs > Chi-square...
• Put your categorical variable in the box below Test Variable List
• Fill in the population proportions / probabilities according to $H_0$ in the box below Expected Values. If $H_0$ states that they are all equal, just pick 'All categories equal' (default)
Analyze > Descriptive Statistics > Crosstabs...
• Put one of your two categorical variables in the box below Row(s), and the other categorical variable in the box below Column(s)
• Click the Statistics... button, and click on the square in front of Chi-square
• Continue and click OK
Analyze > Nonparametric Tests > Legacy Dialogs > Binomial...
• Put your dichotomous variable in the box below Test Variable List
• Fill in the value for $\pi_0$ in the box next to Test Proportion
If computation time allows, SPSS will give you the exact $p$ value based on the binomial distribution, rather than the approximate $p$ value based on the normal distribution
JamoviJamoviJamovi
Frequencies > N Outcomes - $\chi^2$ Goodness of fit
• Put your categorical variable in the box below Variable
• Click on Expected Proportions and fill in the population proportions / probabilities according to $H_0$ in the boxes below Ratio. If $H_0$ states that they are all equal, you can leave the ratios equal to the default values (1)
Frequencies > Independent Samples - $\chi^2$ test of association
• Put one of your two categorical variables in the box below Rows, and the other categorical variable in the box below Columns
Frequencies > 2 Outcomes - Binomial test
• Put your dichotomous variable in the white box at the right
• Fill in the value for $\pi_0$ in the box next to Test value
• Under Hypothesis, select your alternative hypothesis
Jamovi will give you the exact $p$ value based on the binomial distribution, rather than the approximate $p$ value based on the normal distribution
Practice questionsPractice questionsPractice questions