Chi-Square Test of Independence

 

In this section, we learn how to run a test on two categorical variables to see if they are independent of each other.

The idea behind a test of independence is to compare the actual number of observations for each categorical variable with the number expected for each categorical variable. If we find a significant difference between the observed count and the theoretical or expected count we would have sufficient evidence to reject the claim of a independence. Once again we use the chi-square formula for comparing observed counts with expected counts.

Suppose one variable is smoking status with three categories:
(a) never smoked
(b) currently smoke
(c) former smoker

Let the other categorical variable be years of education with three categories:
(a) did not finish high school
(b) high school graduate
(c)  BS degree or higher

Lets use a Chi-square test to see if the two variables are independent of each other.

 Null Hypothesis: Smoking status and years of education are independent
Alternate hypothesis: Smoking status and years of education are not independent

To begin with, we collect data on the two variables and placed the data into a table called a contingency table.

Now, how do we calculate the expected values for each cell in the contingency table?

To find the expected frequencies in a cell of a contingency table, multiply the row total of the row containing the cell by the column total of the column containing the cell and divide this product by the grand total

The table below shows both the observed values and the expected values for each of the six cells.

Next we calculate the chi square value using the formula from above.

The bigger the value for a calculated chi-square the greater the difference between the observed and expected values. We compare this calculated chi-square value with a critical chi-square value obtained from a table. If the calculated chi-square value is greater than the critical chi-square value we reject the notion that the variables are independent.

Now, to obtain the critical chi-square from the table, we must know both the number of degrees of freedom and the alpha level. The formula  for determining the d.f. is given below.

d.f. =(number of rows -1) (number of columns -1)

In our case the number of degrees of freedom are (3-1)(3-1) =4

Testing at the alpha level of 0.05 and from the tables we find the critical chi-square to be 9.488. Since the calculated or sample chi-square is greater that the critical chi-square, we reject the null hypothesis of independence of the variables.