In this section, we learn how to run a test to see how good a frequency
distribution fits its claimed distribution. For example, a bag of
M&M's may claim to have 30% red, 20% brown, 10% yellow, 20% green and
20% orange candies. To run a test of goodness of fit we must use a
theoretical distribution known as a chi-square distribution. Below is a graph of
several chi-square distributions. Note each curve depends on only one parameter,
namely the degree of freedom d.f.
The degree of freedom is calculated as one less than the number of categories.
In our example of M&M's there were 5 categories of colors, so in this case
we would choose a chi-square distribution with 4 degrees of freedom.
The idea behind a test of goodness of fit is to compare the actual number of
observations for each category with the number expected in each category if the
claim were true. If we find a significant difference between the observed count
and the theoretical or expected count we would have sufficient evidence to reject the
claim of a good fit. If fact, the formula for comparing observed counts with
expected counts is given below.
The bigger the value of a calculated chi-square the greater the difference
between the observed and expected values. We compare this calculated chi-square
value with a theoretical chi-square value obtained from a table. If the
calculated chi-square value is greater than the theoretical chi-square value we
reject the notion that the fit is good. Lets apply this new statistical technique
to a bag of M&M's.
Suppose we purchased a bag containing 200 M&M candies and the manufacturer
claims that the bag
contained 30% red, 20% brown, 10% yellow, 20% green and 20% orange candies.
After opening the bag we make the following observations. See table below.
Now lets use the chi-square formula to calculate the chi square value.
Now we compare this calculated or sample chi-square with the table or theoretical chi-square.
Since there are five categories of colors we look-up a chi-square curve with 4
degrees of freedom. Suppose we are testing at the .05 confidence level, from the
surfstate we find the chi-square table, entering d.f.=4 and alpha/probability =
0.05
we see that the critical chi-square value is 9.487. Since the
calculated chi-square value is greater that the critical chi-square value we
reject the manufactures claim regarding the distribution of colored candy in the
bag.
The null hypothesis is: The observed frequency distribution fits the expected frequency distribution.
The alternative hypothesis is: The observed frequency distribution does not fit the expected frequency distribution.