Chi-squared test for the relationship between two categorical variables - overview

This page offers structured overviews of one or more selected methods. Add additional methods for comparisons by clicking on the dropdown button in the right-hand column. To practice with a specific method click the button at the bottom row of the table

Chi-squared test for the relationship between two categorical variables
Independent /column variable
One categorical with $I$ independent groups ($I \geqslant 2$)
Dependent /row variable
One categorical with $J$ independent groups ($J \geqslant 2$)
Null hypothesis
  • There is no association between the row and column variable
    More precise statement:
    • If there are $I$ independent random samples of size $n_i$ from each of $I$ populations, defined by the independent variable:
      The distribution of the dependent variable is the same in each of the $I$ populations
    • If there is one random sample of size $N$ from the total population:
      The row and column variables are independent
Alternative hypothesis
  • There is an association between the row and column variable
    More precise statement:
    • If there are $I$ independent random samples of size $n_i$ from each of $I$ populations, defined by the independent variable:
      The distribution of the dependent variable is not the same in all of the $I$ populations
    • If there is one random sample of size $N$ from the total population:
      The row and column variables are dependent
Assumptions
  • Sample size is large enough for $X^2$ to be approximately chi-squared distributed under the null hypothesis. Rule of thumb:
    • 2 $\times$ 2 table: all four expected cell counts are 5 or more
    • Larger than 2 $\times$ 2 tables: average of the expected cell counts is 5 or more, smallest expected cell count is 1 or more
  • There are $I$ independent simple random samples from each of $I$ populations defined by the independent variable, or there is one simple random sample from the total population
Test statistic
$X^2 = \sum{\frac{(\mbox{observed cell count} - \mbox{expected cell count})^2}{\mbox{expected cell count}}}$
where for each cell, the expected cell count = $\dfrac{\mbox{row total} \times \mbox{column total}}{\mbox{total sample size}}$, the observed cell count is the observed sample count in that same cell, and the sum is over all $I \times J$ cells
Sampling distribution of $X^2$ if H0 were true
Approximately a chi-squared distribution with $(I - 1) \times (J - 1)$ degrees of freedom
Significant?
  • Check if $X^2$ observed in sample is equal to or larger than critical value $X^{2*}$ or
  • Find $p$ value corresponding to observed $X^2$ and check if it is equal to or smaller than $\alpha$
Example context
Is there an association between economic class and gender? Is the distribution of economic class different between men and women?
SPSS
Analyze > Descriptive Statistics > Crosstabs...
  • Put one of your two categorical variables in the box below Row(s), and the other categorical variable in the box below Column(s)
  • Click the Statistics... button, and click on the square in front of Chi-square
  • Continue and click OK
Jamovi
Frequencies > Independent Samples - $\chi^2$ test of association
  • Put one of your two categorical variables in the box below Rows, and the other categorical variable in the box below Columns
Practice questions