# Goodness of fit test - overview

This page offers structured overviews of one or more selected methods. Add additional methods for comparisons by clicking on the dropdown button in the right-hand column. To practice with a specific method click the button at the bottom row of the table

Goodness of fit test
Paired sample $t$ test
One way ANOVA
Independent variableIndependent variableIndependent/grouping variable
None2 paired groupsOne categorical with $I$ independent groups ($I \geqslant 2$)
Dependent variableDependent variableDependent variable
One categorical with $J$ independent groups ($J \geqslant 2$)One quantitative of interval or ratio levelOne quantitative of interval or ratio level
Null hypothesisNull hypothesisNull hypothesis
• H0: the population proportions in each of the $J$ conditions are $\pi_1$, $\pi_2$, $\ldots$, $\pi_J$
or equivalently
• H0: the probability of drawing an observation from condition 1 is $\pi_1$, the probability of drawing an observation from condition 2 is $\pi_2$, $\ldots$, the probability of drawing an observation from condition $J$ is $\pi_J$
H0: $\mu = \mu_0$

Here $\mu$ is the population mean of the difference scores, and $\mu_0$ is the population mean of the difference scores according to the null hypothesis, which is usually 0. A difference score is the difference between the first score of a pair and the second score of a pair.
ANOVA $F$ test:
• H0: $\mu_1 = \mu_2 = \ldots = \mu_I$
$\mu_1$ is the population mean for group 1; $\mu_2$ is the population mean for group 2; $\mu_I$ is the population mean for group $I$
$t$ Test for contrast:
• H0: $\Psi = 0$
$\Psi$ is the population contrast, defined as $\Psi = \sum a_i\mu_i$. Here $\mu_i$ is the population mean for group $i$ and $a_i$ is the coefficient for $\mu_i$. The coefficients $a_i$ sum to 0.
$t$ Test multiple comparisons:
• H0: $\mu_g = \mu_h$
$\mu_g$ is the population mean for group $g$; $\mu_h$ is the population mean for group $h$
Alternative hypothesisAlternative hypothesisAlternative hypothesis
• H1: the population proportions are not all as specified under the null hypothesis
or equivalently
• H1: the probabilities of drawing an observation from each of the conditions are not all as specified under the null hypothesis
H1 two sided: $\mu \neq \mu_0$
H1 right sided: $\mu > \mu_0$
H1 left sided: $\mu < \mu_0$
ANOVA $F$ test:
• H1: not all population means are equal
$t$ Test for contrast:
• H1 two sided: $\Psi \neq 0$
• H1 right sided: $\Psi > 0$
• H1 left sided: $\Psi < 0$
$t$ Test multiple comparisons:
• H1 - usually two sided: $\mu_g \neq \mu_h$
AssumptionsAssumptionsAssumptions
• Sample size is large enough for $X^2$ to be approximately chi-squared distributed. Rule of thumb: all $J$ expected cell counts are 5 or more
• Sample is a simple random sample from the population. That is, observations are independent of one another
• Difference scores are normally distributed in the population
• Sample of difference scores is a simple random sample from the population of difference scores. That is, difference scores are independent of one another
• Within each population, the scores on the dependent variable are normally distributed
• The standard deviation of the scores on the dependent variable is the same in each of the populations: $\sigma_1 = \sigma_2 = \ldots = \sigma_I$
• Group 1 sample is a simple random sample (SRS) from population 1, group 2 sample is an independent SRS from population 2, $\ldots$, group $I$ sample is an independent SRS from population $I$. That is, within and between groups, observations are independent of one another
Test statisticTest statisticTest statistic
$X^2 = \sum{\frac{(\mbox{observed cell count} - \mbox{expected cell count})^2}{\mbox{expected cell count}}}$
Here the expected cell count for one cell = $N \times \pi_j$, the observed cell count is the observed sample count in that same cell, and the sum is over all $J$ cells.
$t = \dfrac{\bar{y} - \mu_0}{s / \sqrt{N}}$
Here $\bar{y}$ is the sample mean of the difference scores, $\mu_0$ is the population mean of the difference scores according to the null hypothesis, $s$ is the sample standard deviation of the difference scores, and $N$ is the sample size (number of difference scores).

The denominator $s / \sqrt{N}$ is the standard error of the sampling distribution of $\bar{y}$. The $t$ value indicates how many standard errors $\bar{y}$ is removed from $\mu_0$.
ANOVA $F$ test:
• \begin{aligned}[t] F &= \dfrac{\sum\nolimits_{subjects} (\mbox{subject's group mean} - \mbox{overall mean})^2 / (I - 1)}{\sum\nolimits_{subjects} (\mbox{subject's score} - \mbox{its group mean})^2 / (N - I)}\\ &= \dfrac{\mbox{sum of squares between} / \mbox{degrees of freedom between}}{\mbox{sum of squares error} / \mbox{degrees of freedom error}}\\ &= \dfrac{\mbox{mean square between}}{\mbox{mean square error}} \end{aligned}
where $N$ is the total sample size, and $I$ is the number of groups.
Note: mean square between is also known as mean square model, and mean square error is also known as mean square residual or mean square within.
$t$ Test for contrast:
• $t = \dfrac{c}{s_p\sqrt{\sum \dfrac{a^2_i}{n_i}}}$
Here $c$ is the sample estimate of the population contrast $\Psi$: $c = \sum a_i\bar{y}_i$, with $\bar{y}_i$ the sample mean in group $i$. $s_p$ is the pooled standard deviation based on all the $I$ groups in the ANOVA, $a_i$ is the contrast coefficient for group $i$, and $n_i$ is the sample size of group $i$.
Note that if the contrast compares only two group means with each other, this $t$ statistic is very similar to the two sample $t$ statistic (assuming equal population standard deviations). In that case the only difference is that we now base the pooled standard deviation on all the $I$ groups, which affects the $t$ value if $I \geqslant 3$. It also affects the corresponding degrees of freedom.
$t$ Test multiple comparisons:
• $t = \dfrac{\bar{y}_g - \bar{y}_h}{s_p\sqrt{\dfrac{1}{n_g} + \dfrac{1}{n_h}}}$
$\bar{y}_g$ is the sample mean in group $g$, $\bar{y}_h$ is the sample mean in group $h$, $s_p$ is the pooled standard deviation based on all the $I$ groups in the ANOVA, $n_g$ is the sample size of group $g$, and $n_h$ is the sample size of group $h$.
Note that this $t$ statistic is very similar to the two sample $t$ statistic (assuming equal population standard deviations). The only difference is that we now base the pooled standard deviation on all the $I$ groups, which affects the $t$ value if $I \geqslant 3$. It also affects the corresponding degrees of freedom.
n.a.n.a.Pooled standard deviation
--\begin{aligned} s_p &= \sqrt{\dfrac{(n_1 - 1) \times s^2_1 + (n_2 - 1) \times s^2_2 + \ldots + (n_I - 1) \times s^2_I}{N - I}}\\ &= \sqrt{\dfrac{\sum\nolimits_{subjects} (\mbox{subject's score} - \mbox{its group mean})^2}{N - I}}\\ &= \sqrt{\dfrac{\mbox{sum of squares error}}{\mbox{degrees of freedom error}}}\\ &= \sqrt{\mbox{mean square error}} \end{aligned}

Here $s^2_i$ is the variance in group $i.$
Sampling distribution of $X^2$ if H0 were trueSampling distribution of $t$ if H0 were trueSampling distribution of $F$ and of $t$ if H0 were true
Approximately the chi-squared distribution with $J - 1$ degrees of freedom$t$ distribution with $N - 1$ degrees of freedomSampling distribution of $F$:
• $F$ distribution with $I - 1$ (df between, numerator) and $N - I$ (df error, denominator) degrees of freedom
Sampling distribution of $t$:
• $t$ distribution with $N - I$ degrees of freedom
Significant?Significant?Significant?
• Check if $X^2$ observed in sample is equal to or larger than critical value $X^{2*}$ or
• Find $p$ value corresponding to observed $X^2$ and check if it is equal to or smaller than $\alpha$
Two sided:
Right sided:
Left sided:
$F$ test:
• Check if $F$ observed in sample is equal to or larger than critical value $F^*$ or
• Find $p$ value corresponding to observed $F$ and check if it is equal to or smaller than $\alpha$ (e.g. .01 < $p$ < .025 when $F$ = 3.91, df between = 4, and df error = 20)

$t$ Test for contrast two sided:
$t$ Test for contrast right sided:
$t$ Test for contrast left sided:

$t$ Test multiple comparisons two sided:
• Check if $t$ observed in sample is at least as extreme as critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find two sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
$t$ Test multiple comparisons right sided
• Check if $t$ observed in sample is equal to or larger than critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find right sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
$t$ Test multiple comparisons left sided
• Check if $t$ observed in sample is equal to or smaller than critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find left sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
n.a.$C\%$ confidence interval for \mu$$C\% confidence interval for \Psi, for \mu_g - \mu_h, and for \mu_i -\bar{y} \pm t^* \times \dfrac{s}{\sqrt{N}} where the critical value t^* is the value under the t_{N-1} distribution with the area C / 100 between -t^* and t^* (e.g. t^* = 2.086 for a 95% confidence interval when df = 20). The confidence interval for \mu can also be used as significance test. Confidence interval for \Psi (contrast): • c \pm t^* \times s_p\sqrt{\sum \dfrac{a^2_i}{n_i}} where the critical value t^* is the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^* (e.g. t^* = 2.086 for a 95% confidence interval when df = 20). Note that n_i is the sample size of group i, and N is the total sample size, based on all the I groups. Confidence interval for \mu_g - \mu_h (multiple comparisons): • (\bar{y}_g - \bar{y}_h) \pm t^{**} \times s_p\sqrt{\dfrac{1}{n_g} + \dfrac{1}{n_h}} where t^{**} depends upon C, degrees of freedom (N - I), and the multiple comparison procedure. If you do not want to apply a multiple comparison procedure, t^{**} = t^* = the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^*. Note that n_g is the sample size of group g, n_h is the sample size of group h, and N is the total sample size, based on all the I groups. Confidence interval for single population mean \mu_i: • \bar{y}_i \pm t^* \times \dfrac{s_p}{\sqrt{n_i}} where \bar{y}_i is the sample mean in group i, n_i is the sample size of group i, and the critical value t^* is the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^* (e.g. t^* = 2.086 for a 95% confidence interval when df = 20). Note that n_i is the sample size of group i, and N is the total sample size, based on all the I groups. n.a.Effect sizeEffect size -Cohen's d: Standardized difference between the sample mean of the difference scores and \mu_0:$$d = \frac{\bar{y} - \mu_0}{s}$$Cohen's d indicates how many standard deviations s the sample mean of the difference scores \bar{y} is removed from \mu_0. • Proportion variance explained \eta^2 and R^2: Proportion variance of the dependent variable y explained by the independent variable:$$ \begin{align} \eta^2 = R^2 &= \dfrac{\mbox{sum of squares between}}{\mbox{sum of squares total}} \end{align} $$Only in one way ANOVA \eta^2 = R^2. \eta^2 (and R^2) is the proportion variance explained in the sample. It is a positively biased estimate of the proportion variance explained in the population. • Proportion variance explained \omega^2: Corrects for the positive bias in \eta^2 and is equal to:$$\omega^2 = \frac{\mbox{sum of squares between} - \mbox{df between} \times \mbox{mean square error}}{\mbox{sum of squares total} + \mbox{mean square error}}$$\omega^2 is a better estimate of the explained variance in the population than \eta^2. • Cohen's d: Standardized difference between the mean in group g and in group h:$$d_{g,h} = \frac{\bar{y}_g - \bar{y}_h}{s_p}$Cohen's$d$indicates how many standard deviations$s_p$two sample means are removed from each other. n.a.Visual representationn.a. -- n.a.n.a.ANOVA table -- Click the link for a step by step explanation of how to compute the sum of squares. n.a.Equivalent toEquivalent to - • One sample$t$test on the difference scores. • Repeated measures ANOVA with one dichotomous within subjects factor. OLS regression with one categorical independent variable transformed into$I - 1$code variables: •$F$test ANOVA is equivalent to$F$test regression model •$t$test for contrast$i$is equivalent to$t$test for regression coefficient$\beta_i$(specific contrast tested depends on how the code variables are defined) Example contextExample contextExample context Is the proportion of people with a low, moderate, and high social economic status in the population different from$\pi_{low} = 0.2,\pi_{moderate} = 0.6,$and$\pi_{high} = 0.2$?Is the average difference between the mental health scores before and after an intervention different from$\mu_0 = 0$?Is the average mental health score different between people from a low, moderate, and high economic class? SPSSSPSSSPSS Analyze > Nonparametric Tests > Legacy Dialogs > Chi-square... • Put your categorical variable in the box below Test Variable List • Fill in the population proportions / probabilities according to$H_0$in the box below Expected Values. If$H_0$states that they are all equal, just pick 'All categories equal' (default) Analyze > Compare Means > Paired-Samples T Test... • Put the two paired variables in the boxes below Variable 1 and Variable 2 Analyze > Compare Means > One-Way ANOVA... • Put your dependent (quantitative) variable in the box below Dependent List and your independent (grouping) variable in the box below Factor or Analyze > General Linear Model > Univariate... • Put your dependent (quantitative) variable in the box below Dependent Variable and your independent (grouping) variable in the box below Fixed Factor(s) JamoviJamoviJamovi Frequencies > N Outcomes -$\chi^2$Goodness of fit • Put your categorical variable in the box below Variable • Click on Expected Proportions and fill in the population proportions / probabilities according to$H_0$in the boxes below Ratio. If$H_0\$ states that they are all equal, you can leave the ratios equal to the default values (1)
T-Tests > Paired Samples T-Test
• Put the two paired variables in the box below Paired Variables, one on the left side of the vertical line and one on the right side of the vertical line
• Under Hypothesis, select your alternative hypothesis
ANOVA > ANOVA
• Put your dependent (quantitative) variable in the box below Dependent Variable and your independent (grouping) variable in the box below Fixed Factors
Practice questionsPractice questionsPractice questions