Chi-Square Goodness of Fit

 

In this section, we learn how to run a test to see how good a frequency distribution fits its claimed distribution. For example, a bag of M&M's  may claim to have 30% red, 20% brown, 10% yellow, 20% green and 20% orange candies. To run a test of goodness of fit we must use a theoretical distribution known as a chi-square distribution. Below is a graph of several chi-square distributions. Note each curve depends on only one parameter, namely the degree of freedom d.f.



The degree of freedom is calculated as one less than the number of categories. In our example of M&M's there were 5 categories of colors, so in this case we would choose a chi-square distribution with 4 degrees of freedom.

The idea behind a test of goodness of fit is to compare the actual number of observations for each category with the number expected in each category if the claim were true. If we find a significant difference between the observed count and the theoretical or expected count we would have sufficient evidence to reject the claim of a good fit. If fact, the formula for comparing observed counts with expected counts is given below.

The bigger the value of a calculated chi-square the greater the difference between the observed and expected values. We compare this calculated chi-square value with a theoretical chi-square value obtained from a table. If the calculated chi-square value is greater than the theoretical chi-square value we reject the notion that the fit is good. Lets apply this new statistical technique to a bag of M&M's.

Suppose we purchased a bag containing 200 M&M candies and the manufacturer claims that the bag contained 30% red, 20% brown, 10% yellow, 20% green and 20% orange candies. After opening the bag we make the following observations. See table below.

 
Now lets use the chi-square formula to calculate the chi square value.



Now we compare this calculated or sample chi-square with the table or theoretical chi-square. Since there are five categories of colors we look-up a chi-square curve with 4 degrees of freedom. Suppose we are testing at the .05 confidence level, from the surfstate we find the chi-square table, entering d.f.=4 and alpha/probability = 0.05 we see that the critical chi-square value is 9.487. Since the calculated chi-square value is greater that the critical chi-square value we reject the manufactures claim regarding the distribution of colored candy in the bag.

The null hypothesis is: The observed frequency distribution fits the expected frequency distribution. 

The alternative hypothesis is: The observed frequency distribution does not fit the expected frequency distribution.