Goodness of fit test - overview

This page offers structured overviews of one or more selected methods. Add additional methods for comparisons (max. of 3) by clicking on the dropdown button in the right-hand column. To practice with a specific method click the button at the bottom row of the table

Goodness of fit test
Paired sample $t$ test
One sample Wilcoxon signed-rank test
Marginal Homogeneity test / Stuart-Maxwell test
Independent variableIndependent variableIndependent variableIndependent variable
None2 paired groupsNone2 paired groups
Dependent variableDependent variableDependent variableDependent variable
One categorical with $J$ independent groups ($J \geqslant 2$)One quantitative of interval or ratio levelOne of ordinal levelOne categorical with $J$ independent groups ($J \geqslant 2$)
Null hypothesisNull hypothesisNull hypothesisNull hypothesis
  • H0: the population proportions in each of the $J$ conditions are $\pi_1$, $\pi_2$, $\ldots$, $\pi_J$
or equivalently
  • H0: the probability of drawing an observation from condition 1 is $\pi_1$, the probability of drawing an observation from condition 2 is $\pi_2$, $\ldots$, the probability of drawing an observation from condition $J$ is $\pi_J$
H0: $\mu = \mu_0$

Here $\mu$ is the population mean of the difference scores, and $\mu_0$ is the population mean of the difference scores according to the null hypothesis, which is usually 0. A difference score is the difference between the first score of a pair and the second score of a pair.
H0: $m = m_0$

Here $m$ is the population median, and $m_0$ is the population median according to the null hypothesis.
H0: for each category $j$ of the dependent variable, $\pi_j$ for the first paired group = $\pi_j$ for the second paired group.

Here $\pi_j$ is the population proportion in category $j.$
Alternative hypothesisAlternative hypothesisAlternative hypothesisAlternative hypothesis
  • H1: the population proportions are not all as specified under the null hypothesis
or equivalently
  • H1: the probabilities of drawing an observation from each of the conditions are not all as specified under the null hypothesis
H1 two sided: $\mu \neq \mu_0$
H1 right sided: $\mu > \mu_0$
H1 left sided: $\mu < \mu_0$
H1 two sided: $m \neq m_0$
H1 right sided: $m > m_0$
H1 left sided: $m < m_0$
H1: for some categories of the dependent variable, $\pi_j$ for the first paired group $\neq$ $\pi_j$ for the second paired group.
AssumptionsAssumptionsAssumptionsAssumptions
  • Sample size is large enough for $X^2$ to be approximately chi-squared distributed. Rule of thumb: all $J$ expected cell counts are 5 or more
  • Sample is a simple random sample from the population. That is, observations are independent of one another
  • Difference scores are normally distributed in the population
  • Sample of difference scores is a simple random sample from the population of difference scores. That is, difference scores are independent of one another
  • The population distribution of the scores is symmetric
  • Sample is a simple random sample from the population. That is, observations are independent of one another
  • Sample of pairs is a simple random sample from the population of pairs. That is, pairs are independent of one another
Test statisticTest statisticTest statisticTest statistic
$X^2 = \sum{\frac{(\mbox{observed cell count} - \mbox{expected cell count})^2}{\mbox{expected cell count}}}$
Here the expected cell count for one cell = $N \times \pi_j$, the observed cell count is the observed sample count in that same cell, and the sum is over all $J$ cells.
$t = \dfrac{\bar{y} - \mu_0}{s / \sqrt{N}}$
Here $\bar{y}$ is the sample mean of the difference scores, $\mu_0$ is the population mean of the difference scores according to the null hypothesis, $s$ is the sample standard deviation of the difference scores, and $N$ is the sample size (number of difference scores).

The denominator $s / \sqrt{N}$ is the standard error of the sampling distribution of $\bar{y}$. The $t$ value indicates how many standard errors $\bar{y}$ is removed from $\mu_0$.
Two different types of test statistics can be used, but both will result in the same test outcome. We will denote the first option the $W_1$ statistic (also known as the $T$ statistic), and the second option the $W_2$ statistic. In order to compute each of the test statistics, follow the steps below:
  1. For each subject, compute the sign of the difference score $\mbox{sign}_d = \mbox{sgn}(\mbox{score} - m_0)$. The sign is 1 if the difference is larger than zero, -1 if the diffence is smaller than zero, and 0 if the difference is equal to zero.
  2. For each subject, compute the absolute value of the difference score $|\mbox{score} - m_0|$.
  3. Exclude subjects with a difference score of zero. This leaves us with a remaining number of difference scores equal to $N_r$.
  4. Assign ranks $R_d$ to the $N_r$ remaining absolute difference scores. The smallest absolute difference score corresponds to a rank score of 1, and the largest absolute difference score corresponds to a rank score of $N_r$. If there are ties, assign them the average of the ranks they occupy.
Then compute the test statistic:

  • $W_1 = \sum\, R_d^{+}$
    or
    $W_1 = \sum\, R_d^{-}$
    That is, sum all ranks corresponding to a positive difference or sum all ranks corresponding to a negative difference. Theoratically, both definitions will result in the same test outcome. However:
    • Tables with critical values for $W_1$ are usually based on the smaller of $\sum\, R_d^{+}$ and $\sum\, R_d^{-}$. So if you are using such a table, pick the smaller one.
    • If you are using the normal approximation to find the $p$ value, it makes things most straightforward if you use $W_1 = \sum\, R_d^{+}$ (if you use $W_1 = \sum\, R_d^{-}$, the right and left sided alternative hypotheses 'flip').
  • $W_2 = \sum\, \mbox{sign}_d \times R_d$
    That is, for each remaining difference score, multiply the rank of the absolute difference score by the sign of the difference score, and then sum all of the products.
Computing the test statistic is a bit complicated and involves matrix algebra. Unless you are following a technical course, you probably won't need to calculate it by hand.
Sampling distribution of $X^2$ if H0 were trueSampling distribution of $t$ if H0 were trueSampling distribution of $W_1$ and of $W_2$ if H0 were trueSampling distribution of the test statistic if H0 were true
Approximately the chi-squared distribution with $J - 1$ degrees of freedom$t$ distribution with $N - 1$ degrees of freedomSampling distribution of $W_1$:
If $N_r$ is large, $W_1$ is approximately normally distributed with mean $\mu_{W_1}$ and standard deviation $\sigma_{W_1}$ if the null hypothesis were true. Here $$\mu_{W_1} = \frac{N_r(N_r + 1)}{4}$$ $$\sigma_{W_1} = \sqrt{\frac{N_r(N_r + 1)(2N_r + 1)}{24}}$$ Hence, if $N_r$ is large, the standardized test statistic $$z = \frac{W_1 - \mu_{W_1}}{\sigma_{W_1}}$$ follows approximately the standard normal distribution if the null hypothesis were true.

Sampling distribution of $W_2$:
If $N_r$ is large, $W_2$ is approximately normally distributed with mean $0$ and standard deviation $\sigma_{W_2}$ if the null hypothesis were true. Here $$\sigma_{W_2} = \sqrt{\frac{N_r(N_r + 1)(2N_r + 1)}{6}}$$ Hence, if $N_r$ is large, the standardized test statistic $$z = \frac{W_2}{\sigma_{W_2}}$$ follows approximately the standard normal distribution if the null hypothesis were true.

If $N_r$ is small, the exact distribution of $W_1$ or $W_2$ should be used.

Note: if ties are present in the data, the formula for the standard deviations $\sigma_{W_1}$ and $\sigma_{W_2}$ is more complicated.
Approximately the chi-squared distribution with $J - 1$ degrees of freedom
Significant?Significant?Significant?Significant?
  • Check if $X^2$ observed in sample is equal to or larger than critical value $X^{2*}$ or
  • Find $p$ value corresponding to observed $X^2$ and check if it is equal to or smaller than $\alpha$
Two sided: Right sided: Left sided: For large samples, the table for standard normal probabilities can be used:
Two sided: Right sided: Left sided:
If we denote the test statistic as $X^2$:
  • Check if $X^2$ observed in sample is equal to or larger than critical value $X^{2*}$ or
  • Find $p$ value corresponding to observed $X^2$ and check if it is equal to or smaller than $\alpha$
n.a.$C\%$ confidence interval for $\mu$n.a.n.a.
-$\bar{y} \pm t^* \times \dfrac{s}{\sqrt{N}}$
where the critical value $t^*$ is the value under the $t_{N-1}$ distribution with the area $C / 100$ between $-t^*$ and $t^*$ (e.g. $t^*$ = 2.086 for a 95% confidence interval when df = 20).

The confidence interval for $\mu$ can also be used as significance test.
--
n.a.Effect sizen.a.n.a.
-Cohen's $d$:
Standardized difference between the sample mean of the difference scores and $\mu_0$: $$d = \frac{\bar{y} - \mu_0}{s}$$ Cohen's $d$ indicates how many standard deviations $s$ the sample mean of the difference scores $\bar{y}$ is removed from $\mu_0.$
--
n.a.Visual representationn.a.n.a.
-
Paired sample t test
--
n.a.Equivalent ton.a.n.a.
-
  • One sample $t$ test on the difference scores.
  • Repeated measures ANOVA with one dichotomous within subjects factor.
--
Example contextExample contextExample contextExample context
Is the proportion of people with a low, moderate, and high social economic status in the population different from $\pi_{low} = 0.2,$ $\pi_{moderate} = 0.6,$ and $\pi_{high} = 0.2$?Is the average difference between the mental health scores before and after an intervention different from $\mu_0 = 0$?Is the median mental health score of office workers different from $m_0 = 50$?Subjects are asked to taste three different types of mayonnaise, and to indicate which of the three types of mayonnaise they like best. They then have to drink a glass of beer, and taste and rate the three types of mayonnaise again. Does drinking a beer change which type of mayonnaise people like best?
SPSSSPSSSPSSSPSS
Analyze > Nonparametric Tests > Legacy Dialogs > Chi-square...
  • Put your categorical variable in the box below Test Variable List
  • Fill in the population proportions / probabilities according to $H_0$ in the box below Expected Values. If $H_0$ states that they are all equal, just pick 'All categories equal' (default)
Analyze > Compare Means > Paired-Samples T Test...
  • Put the two paired variables in the boxes below Variable 1 and Variable 2
Specify the measurement level of your variable on the Variable View tab, in the column named Measure. Then go to:

Analyze > Nonparametric Tests > One Sample...
  • On the Objective tab, choose Customize Analysis
  • On the Fields tab, specify the variable for which you want to compute the Wilcoxon signed-rank test
  • On the Settings tab, choose Customize tests and check the box for 'Compare median to hypothesized (Wilcoxon signed-rank test)'. Fill in your $m_0$ in the box next to Hypothesized median
  • Click Run
  • Double click on the output table to see the full results
Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples...
  • Put the two paired variables in the boxes below Variable 1 and Variable 2
  • Under Test Type, select the Marginal Homogeneity test
JamoviJamoviJamovin.a.
Frequencies > N Outcomes - $\chi^2$ Goodness of fit
  • Put your categorical variable in the box below Variable
  • Click on Expected Proportions and fill in the population proportions / probabilities according to $H_0$ in the boxes below Ratio. If $H_0$ states that they are all equal, you can leave the ratios equal to the default values (1)
T-Tests > Paired Samples T-Test
  • Put the two paired variables in the box below Paired Variables, one on the left side of the vertical line and one on the right side of the vertical line
  • Under Hypothesis, select your alternative hypothesis
T-Tests > One Sample T-Test
  • Put your variable in the box below Dependent Variables
  • Under Tests, select Wilcoxon rank
  • Under Hypothesis, fill in the value for $m_0$ in the box next to Test Value, and select your alternative hypothesis
-
Practice questionsPractice questionsPractice questionsPractice questions