Hypothesis Testing 
(Large sample, single sample, testing of a population mean)

Statistical hypothesis testing is a process whereby one uses information from a sample to test a claim made about a population mean. The test involves setting up two opposing hypothesis so that each is the negation or opposite of the other. For example, to test the claim from an automobile battery manufacture that his batteries have an average life of more than 48 months you would set up an alternative hypothesis that the average life of his batteries is less than or equal to 48 months. This way only one of the two hypothesis is true and the other is false. The names of the two hypothesis involved in statistical hypothesis testing are the null hypothesis and the alternative hypothesis. 

The null hypothesis  Ho: This is usually a statement that the population mean has a certain value. It is called null because it is the starting or original point for testing.

The Alternative Hypothesis  Ha: This too is a statement about the population mean but is the opposite of the null hypothesis.

The null hypothesis is the one being tested. The rejection of the null hypothesis implies the acceptance or truth of the alternative hypothesis. The two possible conclusion reached in statistical hypothesis testing are :

1. Fail to reject the null hypothesis
2. Reject the null hypothesis  

 By saying that we fail to reject the null hypothesis we are saying that we have not found sufficient  evidence to reject the null hypothesis, we are not saying that we accept the null hypothesis.

Since statistical hypothesis testing involves sampling from a population, we cannot be certain of our conclusion.  In statistical hypothesis testing there are two ways that we could make an incorrect decision.

Type I error: Rejecting a true null hypothesis.
Type II error:
Failing to reject a false null hypothesis

 You can decrease the probability of making a Type I error by building into the process your level of  confidence or your maximum allowable probability of making a type I error. This maximum allowable probability of making a type I error is denoted by the lower case Greek letter alpha  Many researchers set the alpha level at 0.05 meaning that they want to be 95 % confident that they do not make a type I error. Other common values for alpha are 0.01 and 0.001.

After stating the null and alternative hypothesis and setting the alpha level, the next step in the process is to draw a random sample from the population. From this point on there are two possible approaches to take. You could take either the Classical Approach or the Probability or P value Approach.

The Classical Approach-A Six Step Procedure

1. Identify the population parameter of interest( usually the population mean)
2. Set up the hypothesis, null and alternative
3. Determine the level of significance, alpha 
4. Collect the sample data, calculate the sample statistic (usually the sample mean)
5 Determine the critical region, and determine whether or not the test statistic lies in the critical region
6. State your decision regarding the null hypothesis.

Now lets take a close look at each of these steps.

1.Usually the parameter of interest is the population mean, however it could be the population standard deviation.

2. Three possible ways to set up the hypothesis

           The Null Hypothesis      The Alternative Hypothesis
Greater than or equal to zero >=  Less than zero  <
Less than or equal to zero <= greater than zero >
equal to zero =  not equal to zero
 

Some examples:

(a) An automobile manufacturer claims that the life of his battery is 48 months.
         Ho: u >= 48   Ha: u < 48
   since the alternative hypothesis uses a  less than sign we will conduct a left tail test.

 (b). A manufacturer claims that the mean weight of an item is 60 pounds.
       Ho: u = 60      Ha: u  not equal to 60
  since the alternative hypothesis uses the  not equal sign to sign we will conduct a two tail test

(c) A truck manufacturer claims that her trucks use no more than 3 gallons of gasoline per mile.    Ho: u <= 3      Ha: u >3  Since the alternative hypothesis uses a greater than sign we will conduct a right tail test.

3. Set the level of significance. Usually in classical testing alpha is selected as being 0.05 or 0.01 or 0.001

4. The sample mean is calculated and converted to a z score. This z score is called the test statistic. 

5. The critical region is the set of z scores that will cause us to reject the null hypothesis. For example if we were doing a two tail test with alpha set at 0.05 the critical region would be determined as shown below.

 

Notice that 2.5% of the total area under the curve lies in each critical region making the total critical region area equal to .05 which equals the alpha level.

The critical region for a  right tail test with alpha set equal to 0.05 is shown below.

 

 

Now check to se if the test statistics lies in the critical (blue region) or rejection region. 

6.If the test statistic lies in the critical region, reject the null hypothesis in favor of the alternative hypothesis, if it does not, then you would fail to reject the hull hypothesis.

 

 

To save time and effort the table below relates critical z values to alpha levels and type test.

Alpha       Tails        Critical Z

 0.05        two           plus or minus 1.96
 0.05        right          1.645
 0.05        left            -1.645
 0.01        two          plus or minus 2.58
 0.01        right          2.33
 0.01        left            -2.33

Now lets work an example that illustrates the features of the classical approach to hypothesis testing

Suppose a manufacture of precision electronic equipment claims that a certain device will operate for exactly 48 months and then turn itself off. Suppose that we did not agree with the manufacturers claim and believed that the device would not work for exactly 48 months. We are not sure whether it will work longer than 48 months or whether it will work for a time less than  48 months, we just believe that it will not work for exactly 48 months. We have identified the parameter of interest, namely the time that the device will work. Now, lets set up the hypothesis.

Lets set the probability of making a type I error at alpha = 0.05

 Suppose we are able to draw a random sample of 36 (N=36) electronic devices from the manufacturer and we determined that the sample mean is 44 months and the sample standard deviation is 3.5 months. 

 With a two-tail test and an alpha level of 0.05 we know that the reject region lies to the left of z = -1.96 and to the right of z = 1.96. Now all that is left for us to do is to convert our test statistic (sample mean of 44 months) to a z score and see if the z score lies in the rejection region. From the Central Limit Theorem we know that the distribution of sample means are normally distributed with mean equal to the population mean and standard error equal to sample standard deviation divided by the square root of the sample size. Using this data to convert the sample mean (test statistic) to a z score we find

 Clearly a z score of -6.857 lies in the reject region. So our decision is to: Reject the null hypothesis in favor of the alternative hypothesis.

Hypothesis Testing
(Small sample, single sample, testing of a population mean)

In small sample testing of a population mean things work essentially the same as in large sample testing except critical "t's" are used to determine the rejection region and the test statistic( sample mean) is converted to a t score. 

Example: A company manufactures gas bottles for industrial use and claims that the average hours of use is 500 hours. A purchaser of these bottles doubts the claims and believes the use time is less than 500 hours. To test the manufacturers claim the purchasing agent randomly selects six of these gas bottles from the manufacturer and finds that the sample average is 493 hours with a sample standard deviation of 4 hours. Is the manufacturers claim justified at the 0.05 alpha level using a left tail test?

 We must assume that the population is normal in order to continue with the "t" test.
Using your "t" tables from surfstat and entering the degree of freedom as 5 and selecting the first graph ( the one showing a left tail rejection region) we find the critical "t"  value that corresponds  0.05 in the left wing of the curve  to be -2.015. critical region under the t distribution is the region to the left of t = -2.015.


Calculating the test statistic results in the following:

We reject the null hypothesis since the test statistic clearly lies in the reject region.

The following template will help you organize your work when testing a claim.

The P Value Approach to hypothesis Testing

In each of the examples so far, the alpha level was selected in advance of testing. It is easy to see that the decision rendered depends on the alpha level selected. Often a null hypothesis is  rejected at the 0.05 and we wonder just how far below 0.05 we could go and still reject the null hypothesis. Could we reject the null hypothesis if we wanted to be 99% confident that we will not make a type I error (e.g alpha =0.01). This situation brings up the question, What is the smallest alpha level at which the null hypothesis can be rejected?  P values ( probabilities) are used extensively in statistics and and serve to answer the question of what is the smallest alpha for which the null hypothesis may be  rejected.

Definition: Assuming that the null hypothesis is true, the p value is the probability of obtaining a sample mean as extreme or more extreme than the sample mean actually obtained. 

A small p value associated with a sample mean indicates a rare occurrence and would lead one reject the null hypothesis.

 The p value can also be thought of as the smallest alpha for which the null hypothesis can be rejected. If the p value is 0.03 then one can reject the null hypothesis with a 97% degree confidence level of not making a type I error. If the p value is 0.02 then one could reject the null hypothesis with a 98% degree confidence level against a type I error. On the other hand if the p value is 0.07 than the confidence level against a type I error is only be 93%, somewhat below the 95% confidence level.


Here is a  graphic which will help you make a decision in hypothesis testing
 

Decision Rule Based on P Values

 1. If  p is less than or equal to alpha, then reject the null hypothesis

2. If p is greater than alpha, the fail to reject the null hypothesis.

 How to calculate a p value

1. If using a left-tail test, p equals the area to left of test statistic.
2. If using a right-tail test, p equals area to right of test statistic.
3. If using a two-tailed test, p equals 2 times area right or left of test statistic.

You may use any applet such as the McClellen or Surfstat Applet to determine the p value.

 

 

Hypothesis Testing Difference Between Two Means (Paired difference or matched pairs, dependent samples)

A sampling method from two populations is considered independent when the subjects selected in one sample do not specify how the subjects are selected in the other sample. A sampling method from two populations is considered to be dependent when the subjects selected in one sample determine the subjects selected in the other sample. For example if we are studying the driving habits of husbands and wives, then once a wife is selected for study, her husband is automatically selected for study.
We now consider an example where the subjects are involved in a pre-test, post-test situation. Since the mean score of subjects on the pre-test will be compared to the mean score of the same subjects on the post test, it is clear that this is a dependent testing situation. The null hypothesis will be that the difference in the mean scores for the two test will be zero or the mean score Before equals the mean score After

The alternative hypothesis will be one of the statements below.( Always think of subtracting the After score from the Before score)

Next we turn our attention on how we compute the test value. First, for each data pair, compute the difference between  the before score and the after score (subtract the after score from the before score). Once you have the difference for each pair of scores, sum these differences and calculate the mean for this sum. This mean is the test value. You will also need to calculate the standard deviation for these differences. 

Had we carried out the above procedure  a large number of times, we would have a distribution of sample means. This distribution of sample means has:

(a) a "t" distribution
(b) mean equal to zero.
(c) standard error equal to sample standard deviation divide by square root of sample size.

So, all that is left for you to do is to determine the reject region and convert the test value to a "t" score and determine which region the "t" score lies in.

Example: A teacher wants to see if a power point presentation will change students understanding of a certain concept in biology. Five students were randomly selected for the study and were given a pre-test on their understanding of the biological concept, then the students viewed the power point presentation and afterwards were given a post-test on their understanding of the biological concept. The data are show below.
 Subject  Before Score After Score Difference
Bill 22 35 13
Joe 18 18 0
Jill 20 18 -2
Tom 20 41 21
Sam 25 35 10

The sum of the difference scores = 42. The mean difference score = 42/5 = 8.4
The standard deviation equals = 9.50

Converting the test value(8.4) to a "t" score: (8.4-0)/(9.5/sqrt(5)) = 1.98

Using a right tail test and alpha = 0.05 the critical "t" is: 2.13

Since the test statistic (1.98) lies to the left of the critical "t" (2.13), we fail to reject the null hypothesis  that the difference between pre-test score and post-test score is zero. We can find no evidence to conclude that the power point presentation makes a difference in understanding the biological concept.

Hypothesis Testing Difference Between Two Means (Independent samples)

In this section you will be using a hypothesis test that states that there is no difference in two population means. To perform a z test for the difference between two population means requires that both samples be large samples. If this is not the case then a "t" test is usually performed provided both populations are normal. 

If both samples are large the sampling distribution for the difference between the two means is normally distributed with mean  and standard error given below.

 The test statistic is the difference between the two sample means converted to a z score.

 Note: if the samples are large you can use the sample standard deviations instead of the population standard deviations provided the populations are normally distributed.

Example: An advertiser claims there is a difference in the median household income for card holders of Visa Gold and Gold Master Card. The result of a random sample of 100 customers from each card group are shown below:

Visa Gold            Gold Master Card


x1 =$60,900           x2=$64,300
s1 = $12,000           s2=$15,000
n1 =100                   n2 =100

The two samples are independent. Do the results support the advertisers claim? Use alpha = .05

H0 : U1 =U2          Ha: U1 not equal to U2

Because this is a two tailed test with alpha = .05 the critical Z's are -1.96 and +1.96. Therefore the critical regions are Z<-1.96 and Z>1.96.

The standard error is calculated to be: 1921 ( verify this)

The test statistic z = (60,900-64,300) - 0/1921 = -1.77

Because the test statistic is not in the critical region, you would fail to reject the null hypothesis.

If both samples are not large, and both populations are normally distributed and both population standard deviations are known, then you may use a "z" test.

However, if both samples are not large, and both populations are normally distributed, and both population standard deviations are not known, then you may use one of two  "t" test that depends upon whether the population variances are equal or not. In either case the the standardized test statistic is given below. 

There is a test for equal variances but will not be covered in this course. You will be provided with information regarding the relationship between the variances of the two populations.