# One way ANOVA - overview

This page offers structured overviews of one or more selected methods. Add additional methods for comparisons by clicking on the dropdown button in the right-hand column. To practice with a specific method click the button at the bottom row of the table

One way ANOVA
One way ANOVA
Independent variableIndependent variable
One categorical with $I$ independent groups ($I \geqslant 2$)One categorical with $I$ independent groups ($I \geqslant 2$)
Dependent variableDependent variable
One quantitative of interval or ratio levelOne quantitative of interval or ratio level
Null hypothesisNull hypothesis
ANOVA $F$ test:
• $\mu_1 = \mu_2 = \ldots = \mu_I$
$\mu_1$ is the unknown mean in population 1; $\mu_2$ is the unknown mean in population 2; $\mu_I$ is the unknown mean in population $I$
$t$ Test for contrast:
• $\Psi = 0$
$\Psi$ is a contrast in the population, defined as $\Psi = \sum a_i\mu_i$. Here $\mu_i$ is the unknown mean in population $i$ and $a_i$ is the coefficient for $\mu_i$. The coefficients $a_i$ sum to 0.
$t$ Test multiple comparisons:
• $\mu_g = \mu_h$
$\mu_g$ is the unknown mean in population $g$; $\mu_h$ is the unknown mean in population $h$
ANOVA $F$ test:
• $\mu_1 = \mu_2 = \ldots = \mu_I$
$\mu_1$ is the unknown mean in population 1; $\mu_2$ is the unknown mean in population 2; $\mu_I$ is the unknown mean in population $I$
$t$ Test for contrast:
• $\Psi = 0$
$\Psi$ is a contrast in the population, defined as $\Psi = \sum a_i\mu_i$. Here $\mu_i$ is the unknown mean in population $i$ and $a_i$ is the coefficient for $\mu_i$. The coefficients $a_i$ sum to 0.
$t$ Test multiple comparisons:
• $\mu_g = \mu_h$
$\mu_g$ is the unknown mean in population $g$; $\mu_h$ is the unknown mean in population $h$
Alternative hypothesisAlternative hypothesis
ANOVA $F$ test:
• Not all population means are equal
$t$ Test for contrast:
• Two sided: $\Psi \neq 0$
• Right sided: $\Psi > 0$
• Left sided: $\Psi < 0$
$t$ Test multiple comparisons:
• Usually two sided: $\mu_g \neq \mu_h$
ANOVA $F$ test:
• Not all population means are equal
$t$ Test for contrast:
• Two sided: $\Psi \neq 0$
• Right sided: $\Psi > 0$
• Left sided: $\Psi < 0$
$t$ Test multiple comparisons:
• Usually two sided: $\mu_g \neq \mu_h$
AssumptionsAssumptions
• Within each population, the scores on the dependent variable are normally distributed
• The standard deviation of the scores on the dependent variable is the same in each of the populations: $\sigma_1 = \sigma_2 = \ldots = \sigma_I$
• Group 1 sample is a simple random sample (SRS) from population 1, group 2 sample is an independent SRS from population 2, $\ldots$, group $I$ sample is an independent SRS from population $I$. That is, within and between groups, observations are independent of one another
• Within each population, the scores on the dependent variable are normally distributed
• The standard deviation of the scores on the dependent variable is the same in each of the populations: $\sigma_1 = \sigma_2 = \ldots = \sigma_I$
• Group 1 sample is a simple random sample (SRS) from population 1, group 2 sample is an independent SRS from population 2, $\ldots$, group $I$ sample is an independent SRS from population $I$. That is, within and between groups, observations are independent of one another
Test statisticTest statistic
ANOVA $F$ test:
• \begin{aligned}[t] F &= \dfrac{\sum\nolimits_{subjects} (\mbox{subject's group mean} - \mbox{overall mean})^2 / (I - 1)}{\sum\nolimits_{subjects} (\mbox{subject's score} - \mbox{its group mean})^2 / (N - I)}\\ &= \dfrac{\mbox{sum of squares between} / \mbox{degrees of freedom between}}{\mbox{sum of squares error} / \mbox{degrees of freedom error}}\\ &= \dfrac{\mbox{mean square between}}{\mbox{mean square error}} \end{aligned}
where $N$ is the total sample size, and $I$ is the number of groups.
Note: mean square between is also known as mean square model; mean square error is also known as mean square residual or mean square within
$t$ Test for contrast:
• $t = \dfrac{c}{s_p\sqrt{\sum \dfrac{a^2_i}{n_i}}}$
Here $c$ is the sample estimate of the population contrast $\Psi$: $c = \sum a_i\bar{y}_i$, with $\bar{y}_i$ the sample mean in group $i$. $s_p$ is the pooled standard deviation based on all the $I$ groups in the ANOVA, $a_i$ is the contrast coefficient for group $i$, and $n_i$ is the sample size of group $i$.
Note that if the contrast compares only two group means with each other, this $t$ statistic is very similar to the two sample $t$ statistic (assuming equal population standard deviations). In that case the only difference is that we now base the pooled standard deviation on all the $I$ groups, which affects the $t$ value if $I \geqslant 3$. It also affects the corresponding degrees of freedom.
$t$ Test multiple comparisons:
• $t = \dfrac{\bar{y}_g - \bar{y}_h}{s_p\sqrt{\dfrac{1}{n_g} + \dfrac{1}{n_h}}}$
$\bar{y}_g$ is the sample mean in group $g$, $\bar{y}_h$ is the sample mean in group $h$, $s_p$ is the pooled standard deviation based on all the $I$ groups in the ANOVA, $n_g$ is the sample size of group $g$, and $n_h$ is the sample size of group $h$.
Note that this $t$ statistic is very similar to the two sample $t$ statistic (assuming equal population standard deviations). The only difference is that we now base the pooled standard deviation on all the $I$ groups, which affects the $t$ value if $I \geqslant 3$. It also affects the corresponding degrees of freedom.
ANOVA $F$ test:
• \begin{aligned}[t] F &= \dfrac{\sum\nolimits_{subjects} (\mbox{subject's group mean} - \mbox{overall mean})^2 / (I - 1)}{\sum\nolimits_{subjects} (\mbox{subject's score} - \mbox{its group mean})^2 / (N - I)}\\ &= \dfrac{\mbox{sum of squares between} / \mbox{degrees of freedom between}}{\mbox{sum of squares error} / \mbox{degrees of freedom error}}\\ &= \dfrac{\mbox{mean square between}}{\mbox{mean square error}} \end{aligned}
where $N$ is the total sample size, and $I$ is the number of groups.
Note: mean square between is also known as mean square model; mean square error is also known as mean square residual or mean square within
$t$ Test for contrast:
• $t = \dfrac{c}{s_p\sqrt{\sum \dfrac{a^2_i}{n_i}}}$
Here $c$ is the sample estimate of the population contrast $\Psi$: $c = \sum a_i\bar{y}_i$, with $\bar{y}_i$ the sample mean in group $i$. $s_p$ is the pooled standard deviation based on all the $I$ groups in the ANOVA, $a_i$ is the contrast coefficient for group $i$, and $n_i$ is the sample size of group $i$.
Note that if the contrast compares only two group means with each other, this $t$ statistic is very similar to the two sample $t$ statistic (assuming equal population standard deviations). In that case the only difference is that we now base the pooled standard deviation on all the $I$ groups, which affects the $t$ value if $I \geqslant 3$. It also affects the corresponding degrees of freedom.
$t$ Test multiple comparisons:
• $t = \dfrac{\bar{y}_g - \bar{y}_h}{s_p\sqrt{\dfrac{1}{n_g} + \dfrac{1}{n_h}}}$
$\bar{y}_g$ is the sample mean in group $g$, $\bar{y}_h$ is the sample mean in group $h$, $s_p$ is the pooled standard deviation based on all the $I$ groups in the ANOVA, $n_g$ is the sample size of group $g$, and $n_h$ is the sample size of group $h$.
Note that this $t$ statistic is very similar to the two sample $t$ statistic (assuming equal population standard deviations). The only difference is that we now base the pooled standard deviation on all the $I$ groups, which affects the $t$ value if $I \geqslant 3$. It also affects the corresponding degrees of freedom.
Pooled standard deviationPooled standard deviation
\begin{aligned} s_p &= \sqrt{\dfrac{(n_1 - 1) \times s^2_1 + (n_2 - 1) \times s^2_2 + \ldots + (n_I - 1) \times s^2_I}{N - I}}\\ &= \sqrt{\dfrac{\sum\nolimits_{subjects} (\mbox{subject's score} - \mbox{its group mean})^2}{N - I}}\\ &= \sqrt{\dfrac{\mbox{sum of squares error}}{\mbox{degrees of freedom error}}}\\ &= \sqrt{\mbox{mean square error}} \end{aligned}
where $s^2_i$ is the variance in group $i$
\begin{aligned} s_p &= \sqrt{\dfrac{(n_1 - 1) \times s^2_1 + (n_2 - 1) \times s^2_2 + \ldots + (n_I - 1) \times s^2_I}{N - I}}\\ &= \sqrt{\dfrac{\sum\nolimits_{subjects} (\mbox{subject's score} - \mbox{its group mean})^2}{N - I}}\\ &= \sqrt{\dfrac{\mbox{sum of squares error}}{\mbox{degrees of freedom error}}}\\ &= \sqrt{\mbox{mean square error}} \end{aligned}
where $s^2_i$ is the variance in group $i$
Sampling distribution of $F$ and of $t$ if H0 were trueSampling distribution of $F$ and of $t$ if H0 were true
Sampling distribution of $F$:
• $F$ distribution with $I - 1$ (df between, numerator) and $N - I$ (df error, denominator) degrees of freedom
Sampling distribution of $t$:
• $t$ distribution with $N - I$ degrees of freedom
Sampling distribution of $F$:
• $F$ distribution with $I - 1$ (df between, numerator) and $N - I$ (df error, denominator) degrees of freedom
Sampling distribution of $t$:
• $t$ distribution with $N - I$ degrees of freedom
Significant?Significant?
$F$ test:
• Check if $F$ observed in sample is equal to or larger than critical value $F^*$ or
• Find $p$ value corresponding to observed $F$ and check if it is equal to or smaller than $\alpha$ (e.g. .01 < $p$ < .025 when $F$ = 3.91, df between = 4, and df error = 20)

$t$ Test for contrast two sided:
$t$ Test for contrast right sided:
$t$ Test for contrast left sided:

$t$ Test multiple comparisons two sided:
• Check if $t$ observed in sample is at least as extreme as critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find two sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
$t$ Test multiple comparisons right sided
• Check if $t$ observed in sample is equal to or larger than critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find right sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
$t$ Test multiple comparisons left sided
• Check if $t$ observed in sample is equal to or smaller than critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find left sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
$F$ test:
• Check if $F$ observed in sample is equal to or larger than critical value $F^*$ or
• Find $p$ value corresponding to observed $F$ and check if it is equal to or smaller than $\alpha$ (e.g. .01 < $p$ < .025 when $F$ = 3.91, df between = 4, and df error = 20)

$t$ Test for contrast two sided:
$t$ Test for contrast right sided:
$t$ Test for contrast left sided:

$t$ Test multiple comparisons two sided:
• Check if $t$ observed in sample is at least as extreme as critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find two sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
$t$ Test multiple comparisons right sided
• Check if $t$ observed in sample is equal to or larger than critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find right sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
$t$ Test multiple comparisons left sided
• Check if $t$ observed in sample is equal to or smaller than critical value $t^{**}$. Adapt $t^{**}$ according to a multiple comparison procedure (e.g., Bonferroni) or
• Find left sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$. Adapt the $p$ value or $\alpha$ according to a multiple comparison procedure
$C\%$ confidence interval for $\Psi$, for $\mu_g - \mu_h$, and for \mu_i$$C\% confidence interval for \Psi, for \mu_g - \mu_h, and for \mu_i Confidence interval for \Psi (contrast): • c \pm t^* \times s_p\sqrt{\sum \dfrac{a^2_i}{n_i}} where the critical value t^* is the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^* (e.g. t^* = 2.086 for a 95% confidence interval when df = 20). Note that n_i is the sample size of group i, and N is the total sample size, based on all the I groups. Confidence interval for \mu_g - \mu_h (multiple comparisons): • (\bar{y}_g - \bar{y}_h) \pm t^{**} \times s_p\sqrt{\dfrac{1}{n_g} + \dfrac{1}{n_h}} where t^{**} depends upon C, degrees of freedom (N - I), and the multiple comparison procedure. If you do not want to apply a multiple comparison procedure, t^{**} = t^* = the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^*. Note that n_g is the sample size of group g, n_h is the sample size of group h, and N is the total sample size, based on all the I groups. Confidence interval for single population mean \mu_i: • \bar{y}_i \pm t^* \times \dfrac{s_p}{\sqrt{n_i}} where \bar{y}_i is the sample mean for group i, n_i is the sample size for group i, and the critical value t^* is the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^* (e.g. t^* = 2.086 for a 95% confidence interval when df = 20). Note that n_i is the sample size of group i, and N is the total sample size, based on all the I groups. Confidence interval for \Psi (contrast): • c \pm t^* \times s_p\sqrt{\sum \dfrac{a^2_i}{n_i}} where the critical value t^* is the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^* (e.g. t^* = 2.086 for a 95% confidence interval when df = 20). Note that n_i is the sample size of group i, and N is the total sample size, based on all the I groups. Confidence interval for \mu_g - \mu_h (multiple comparisons): • (\bar{y}_g - \bar{y}_h) \pm t^{**} \times s_p\sqrt{\dfrac{1}{n_g} + \dfrac{1}{n_h}} where t^{**} depends upon C, degrees of freedom (N - I), and the multiple comparison procedure. If you do not want to apply a multiple comparison procedure, t^{**} = t^* = the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^*. Note that n_g is the sample size of group g, n_h is the sample size of group h, and N is the total sample size, based on all the I groups. Confidence interval for single population mean \mu_i: • \bar{y}_i \pm t^* \times \dfrac{s_p}{\sqrt{n_i}} where \bar{y}_i is the sample mean for group i, n_i is the sample size for group i, and the critical value t^* is the value under the t_{N - I} distribution with the area C / 100 between -t^* and t^* (e.g. t^* = 2.086 for a 95% confidence interval when df = 20). Note that n_i is the sample size of group i, and N is the total sample size, based on all the I groups. Effect sizeEffect size • Proportion variance explained \eta^2 and R^2: Proportion variance of the dependent variable y explained by the independent variable:$$ \begin{align} \eta^2 = R^2 &= \dfrac{\mbox{sum of squares between}}{\mbox{sum of squares total}} \end{align} $$Only in one way ANOVA \eta^2 = R^2. \eta^2 (and R^2) is the proportion variance explained in the sample. It is a positively biased estimate of the proportion variance explained in the population. • Proportion variance explained \omega^2: Corrects for the positive bias in \eta^2 and is equal to:$$\omega^2 = \frac{\mbox{sum of squares between} - \mbox{df between} \times \mbox{mean square error}}{\mbox{sum of squares total} + \mbox{mean square error}}$$\omega^2 is a better estimate of the explained variance in the population than \eta^2. • Cohen's d: Standardized difference between the mean in group g and in group h:$$d_{g,h} = \frac{\bar{y}_g - \bar{y}_h}{s_p}$$Indicates how many standard deviations s_p two sample means are removed from each other • Proportion variance explained \eta^2 and R^2: Proportion variance of the dependent variable y explained by the independent variable:$$ \begin{align} \eta^2 = R^2 &= \dfrac{\mbox{sum of squares between}}{\mbox{sum of squares total}} \end{align} $$Only in one way ANOVA \eta^2 = R^2. \eta^2 (and R^2) is the proportion variance explained in the sample. It is a positively biased estimate of the proportion variance explained in the population. • Proportion variance explained \omega^2: Corrects for the positive bias in \eta^2 and is equal to:$$\omega^2 = \frac{\mbox{sum of squares between} - \mbox{df between} \times \mbox{mean square error}}{\mbox{sum of squares total} + \mbox{mean square error}}$$\omega^2 is a better estimate of the explained variance in the population than \eta^2. • Cohen's d: Standardized difference between the mean in group g and in group h:$$d_{g,h} = \frac{\bar{y}_g - \bar{y}_h}{s_p}$Indicates how many standard deviations$s_p$two sample means are removed from each other ANOVA tableANOVA table Click the link for a step by step explanation of how to compute the sum of squares Click the link for a step by step explanation of how to compute the sum of squares Equivalent toEquivalent to OLS regression with one, categorical independent variable transformed into$I - 1$code variables: •$F$test ANOVA equivalent to$F$test regression model •$t$test for contrast$i$equivalent to$t$test for regression coefficient$\beta_i$(specific contrast tested depends on how the code variables are defined) OLS regression with one, categorical independent variable transformed into$I - 1$code variables: •$F$test ANOVA equivalent to$F$test regression model •$t$test for contrast$i$equivalent to$t$test for regression coefficient$\beta_i\$ (specific contrast tested depends on how the code variables are defined)
Example contextExample context
Is the average mental health score different between people from a low, moderate, and high economic class?Is the average mental health score different between people from a low, moderate, and high economic class?
SPSSSPSS
Analyze > Compare Means > One-Way ANOVA...
• Put your dependent (quantitative) variable in the box below Dependent List and your independent (grouping) variable in the box below Factor
or
Analyze > General Linear Model > Univariate...
• Put your dependent (quantitative) variable in the box below Dependent Variable and your independent (grouping) variable in the box below Fixed Factor(s)
Analyze > Compare Means > One-Way ANOVA...
• Put your dependent (quantitative) variable in the box below Dependent List and your independent (grouping) variable in the box below Factor
or
Analyze > General Linear Model > Univariate...
• Put your dependent (quantitative) variable in the box below Dependent Variable and your independent (grouping) variable in the box below Fixed Factor(s)
JamoviJamovi
ANOVA > ANOVA
• Put your dependent (quantitative) variable in the box below Dependent Variable and your independent (grouping) variable in the box below Fixed Factors
ANOVA > ANOVA
• Put your dependent (quantitative) variable in the box below Dependent Variable and your independent (grouping) variable in the box below Fixed Factors
Practice questionsPractice questions