In this section, we learn how to run a test on two categorical variables to
see if they are independent of each other.
The idea behind a test of independence is to compare the actual number of
observations for each categorical variable with the number expected for each categorical variable.
If we find a significant difference between the observed count
and the theoretical or expected count we would have sufficient evidence to reject the
claim of a independence. Once again we use the chi-square formula for comparing observed counts with
expected counts.
Suppose one variable is smoking status with three categories:
(a) never smoked
(b) currently smoke
(c) former smoker
Let the other categorical variable be years of
education with three categories:
(a) did not finish high school
(b)
high school graduate
(c) BS degree or higher
Lets use a Chi-square test to see if the two variables are independent of each other.
Null Hypothesis: Smoking status and years of education are independent
Alternate hypothesis: Smoking status and years of education are not independent
To begin with, we collect data on the two variables and placed the data into a
table called a contingency table.
Now, how do we calculate the expected values for each cell in the contingency table?
To find the expected frequencies in a cell of a contingency table, multiply the row total of the row containing the cell by the column total of the column containing the cell and divide this product by the grand total
The table below shows both the observed values and the expected values for
each of the six cells.
Next we calculate the chi square value using the formula from above.
The bigger the value for a calculated chi-square the greater the difference between the observed and expected values. We compare this calculated chi-square value with a critical chi-square value obtained from a table. If the calculated chi-square value is greater than the critical chi-square value we reject the notion that the variables are independent.
Now, to obtain the critical chi-square from the table, we must know both the number of degrees of freedom and the alpha level. The formula for determining the d.f. is given below.d.f. =(number of rows -1) (number of columns -1)
In our case the number of degrees of freedom are (3-1)(3-1) =4
Testing at the alpha level of 0.05 and from the tables we find the critical
chi-square to be 9.488. Since the calculated or sample chi-square is greater
that the critical chi-square, we reject the null hypothesis of independence of
the variables.