Exercise Set Unit VI

Chi -Square Test Of Independence

Each person in a group of 300 students was identified as male or female and then asked whether their preferred religion was Protestant, Catholic, or Jewish.
Below is a contingency table that shows the frequencies for these categories.

Gender    Protestant    Catholic    Jewish    Total
Male             37               41         44       122
Female          35               72         71       178
Total             72              113       115       300

Complete a hypothesis test on the two variables at the 0.05 level

(a) Name the two variables. Answer: gender and religious preference
(b) What are the categories of religious preference? Answer: Protestant, Catholic, Jewish
(c) What are the categories of gender? Answer: Male and Female
(d) What is the null hypothesis? Answer: Gender and religious preference are independent
(e) What is the alternative hypothesis? Answer: Gender and religious preference are not independent
(f) What is the df to be used in testing? Answer: (2-1)(3-1) =2
(g) What is the value of the calculated chi-square? Answer: 4.604
(h) What is the table or theoretical chi-square value? Answer: 5.99
(i) What is the "p" value? Answer: 9.9944E-02
(j) Is the P value larger or smaller than alpha? Answer: Larger
(k) Is the calculated chi-square larger or smaller that the theoretical chi-square? Answer: Smaller
(l) What is your conclusion? Answer: fail to reject the null hypothesis

Chi-Square Test Of Goodness Of Fit

 An urban economist wishes to test the claim that the distribution of United States residents in the United States is different today than it was in 1999. In 1999, 19.6% of the population resided in the Northeast, 23.0% resided in the Midwest, 35.4% resided in the South, and 22.0% resided in the West (based on data obtained from the Census Bureau). The economists randomly selects 2000 households in the United States and obtains the frequency distribution shown below.



Region...........Observed Frequency........Expected frequency

Northeast             365                         2000(0.196)=392     
Midwest               404                         2000(0.23)=460        
South                   752                         2000(0.354)=708        
West                    404                         2000(0.23)=460        

Use a chi-square test of goodness of fit to determine whether the distribution of residents in the USA is different today from the distribution in 1999 at the alpha level of 0.05 from the East have a different distribution of opinions than  those teachers from the South.


(a) What is the null hypothesis?
(b) What is the df? Answer:
(c) What is the calculated chi-square? Answer:
(d) What is the "p" value? Answer:
(e) What is the theoretical chi-square? Answer:
(f) Is the p value smaller than alpha?
(g) Is the critical chi-square larger than the theoretical chi-square? Answer:
(h) What is the conclusion? Answer:

Correlation  

1.Find the correlation coefficient between age (in years) and systolic blood pressure of the following individuals. Answer: 0.7905

Age                     16      25    39     45     49
Blood Pressure   109   122   143   132   199

2..Which value of  r indicates the strongest correlation:  r = 0.731 or r = -.845? Answer: -.845

3.Which of the following could not represent a correlation coefficient?
(a) 0.926 (b) 1.054 (c) -.7003 (d) -.0000005       Answer: b

Simple Linear Regression

1.The regression equation for the selling price of a house in dollars and number of square feet of heated space is y (hat) =  50.729x +1004.50

(a) Find the selling price for a house with 2000 square feet of heated space. Answer: $102,462.50
(b) Find the selling price of a house that did not have any heated space. Answer: $1004.50

2.The ages (years) of seven men and their systolic blood pressure are given in the table below.

Age                            17      26    37      48    50      68     72
Blood Pressure          110   124   146   140   200   192    200

(a) Find the equation of the regression line with x as the explanatory variable and y as the response variable. Answer: y (hat) = 1.662x+83.338
(b) What is the correlation coefficient between the two variables? Answer:  .8958
(c) What is the coefficient of determination? Answer: 80.25%
(d) How much of the variation in systolic blood pressure is unexplained by this model? Answer:19.75%
(e) Predict the blood pressure for a male age 60. Answer: 183.08
(f) Could there be a lurking variable at work here? If so what might it be? Answer? Yes, Weight

Multiple Linear regression


1.Below are data which shows the carbon monoxide, tar, nicotine content and  weight in milligrams of 13 brands of U.S. cigarettes. Source: FTC Carbon

 Monoxide (x1)       Tar(x2)   Nicotine(x3)  Weight(x4)
        13.6                   14.1            0.86            985.3
        16.6                   16.0            1.06           1093.8
        10.2                     8.0            0.67             928.0
          5.4                     4.1            0.40             946.2
        15.0                   15.0           1.04             888.5
          9.0                     8.8            0.76            1026.7
        12.3                    12.4           0.95              922.5
        16.3                    16.6           1.12              937.2
        15.4                    14.9           1.02              885.8
        13.0                    13.7           1.01             964.3
        14.4                    15.1           0.90             931.6
        10.0                      7.8           0.57             970.5
        10.2                    11.4           0.78           1124.0

(a) Which variable has the highest correlation with x1 and what is the coefficient? Answer: x2= .967
(b) Write the regression equation with x1 as the response variable and x2 as the explanatory variable. Answer: x1 = 2.365 + 0.827x2
(c) How much of the variation in x1 is explained by x2 using this model? Answer: 93.4%
(d) Write the regression equation with x1 as the response variable and x2, x3, and x4 as the explanatory variables. Answer: x1 = 6.318 + 0.822x2 + 0.031x3 - 0.004x4
(e) How much of the variation in x1 is explained by x2, and x3, and x4? Answer: 94.2%
(f) write a regression equation with x2 as the response variable and x4 as the explanatory variable. Answer: x2 = 14.885 - 0.003x4
(g) using this last equation, predict the amount of tar produced by a cigarette weighing 950 milligrams. Answer: 12.202 mg