Hypothesis Testing
Statistical hypothesis testing is a process whereby one uses information from a sample to test a claim made about a population mean. The test involves setting up two opposing hypothesis so that each is the negation or opposite of the other. For example, to test the claim from an automobile battery manufacture that his batteries have an average life of more than 48 months you would set up an alternative hypothesis that the average life of his batteries is less than or equal to 48 months. This way only one of the two hypothesis is true and the other is false. The names of the two hypothesis involved in statistical hypothesis testing are the null hypothesis and the alternative hypothesis.
The null hypothesis Ho: This is usually a statement that the population mean has a certain value. It is called null because it is the starting or original point for testing.
The Alternative Hypothesis Ha: This too is a statement about the population mean but is the opposite of the null hypothesis.
The null hypothesis is the one being tested. The rejection of the null hypothesis implies the acceptance or truth of the alternative hypothesis. The two possible conclusion reached in statistical hypothesis testing are :
1.
Fail to reject the null hypothesis
2. Reject the null
hypothesis
By saying that we fail to reject the null hypothesis we are saying that we have not found sufficient evidence to reject the null hypothesis, we are not saying that we accept the null hypothesis.
Since statistical
hypothesis testing involves sampling from a population, we cannot be certain of
our conclusion. In statistical hypothesis testing there are two ways that
we could make an incorrect decision.
Type I error: Rejecting a true
null hypothesis.
Type II error: Failing to reject a false null
hypothesis
You can decrease the probability of making a Type I error by building into the process your level of confidence or your maximum allowable probability of making a type I error. This maximum allowable probability of making a type I error is denoted by the lower case Greek letter alpha Many researchers set the alpha level at 0.05 meaning that they want to be 95 % confident that they do not make a type I error. Other common values for alpha are 0.01 and 0.001.
After stating the null and alternative hypothesis and setting the alpha level, the next step in the process is to draw a random sample from the population. From this point on there are two possible approaches to take. You could take either the Classical Approach or the Probability or P value Approach.
The Classical Approach-A Six Step Procedure
1. Identify the population parameter of interest( usually the population mean)Now lets take a close look at each of these steps.
1.Usually the parameter of interest is the population mean, however it could be the population standard deviation.
2. Three possible ways to set up the hypothesis
| The Null Hypothesis | The Alternative Hypothesis |
| Greater than or equal to zero >= | Less than zero < |
| Less than or equal to zero <= | greater than zero > |
| equal to zero = | not equal to zero |
Some examples:
(a) An automobile manufacturer claims that the life of his battery is 48 months. (b). A manufacturer claims that the mean weight of an item is 60 pounds.
Ho: u = 60
Ha: u not equal to 60
since the alternative hypothesis uses the not equal sign to sign we will
conduct a two tail test
(c) A truck manufacturer claims that her trucks use no more than 3 gallons of gasoline per mile. Ho: u <= 3 Ha: u >3 Since the alternative hypothesis uses a greater than sign we will conduct a right tail test.
3. Set the level of significance. Usually in classical testing alpha is selected as being 0.05 or 0.01 or 0.001
4. The sample mean is calculated and converted to a z score. This z score is called the test statistic.
5. The critical region is the set of z scores that will cause us to reject the null hypothesis. For example if we were doing a two tail test with alpha set at 0.05 the critical region would be determined as shown below.
Notice that 2.5% of the total area under the curve lies in each critical region making the total critical region area equal to .05 which equals the alpha level.
The critical region for a right tail test with alpha set equal to 0.05 is shown below.
Now check to se if the test statistics lies in the critical (blue region) or rejection region.
6.If the test statistic lies in the critical region, reject the null hypothesis in favor of the alternative hypothesis, if it does not, then you would fail to reject the hull hypothesis.
To save time and effort the table below relates critical z values to alpha levels and type test.
Alpha Tails Critical Z
0.05
two plus or minus
1.96
0.05
right 1.645
0.05
left -1.645
0.01
two plus or minus 2.58
0.01
right 2.33
0.01
left -2.33
Now lets work an example that illustrates the features of the classical approach to hypothesis testing
Suppose a manufacture of precision electronic equipment claims that a certain device will operate for exactly 48 months and then turn itself off. Suppose that we did not agree with the manufacturers claim and believed that the device would not work for exactly 48 months. We are not sure whether it will work longer than 48 months or whether it will work for a time less than 48 months, we just believe that it will not work for exactly 48 months. We have identified the parameter of interest, namely the time that the device will work. Now, lets set up the hypothesis.
Lets set the probability of making a type I error at alpha = 0.05
Suppose we are able to draw a random sample of 36 (N=36) electronic devices from the manufacturer and we determined that the sample mean is 44 months and the sample standard deviation is 3.5 months.
With a two-tail test and an alpha level of 0.05 we know
that the reject region lies to the left of z = -1.96 and to the right of z =
1.96. Now all that is left for us to do is to convert our test statistic (sample mean of 44 months) to a z score and
see if the z score lies in the rejection region. From the Central Limit Theorem
we know that the distribution of sample means are normally distributed with mean
equal to the population mean and standard error equal to sample standard
deviation divided by the square root of the sample size. Using this data to
convert the sample mean (test statistic) to a z score we find
Clearly a z score of -6.857 lies in the reject region. So our decision is to: Reject the null hypothesis in favor of the alternative hypothesis.
Hypothesis TestingIn small sample testing of a population mean things work essentially the same as in large sample testing except critical "t's" are used to determine the rejection region and the test statistic( sample mean) is converted to a t score.
Example: A company manufactures gas bottles for industrial use and claims that the average hours of use is 500 hours. A purchaser of these bottles doubts the claims and believes the use time is less than 500 hours. To test the manufacturers claim the purchasing agent randomly selects six of these gas bottles from the manufacturer and finds that the sample average is 493 hours with a sample standard deviation of 4 hours. Is the manufacturers claim justified at the 0.05 alpha level using a left tail test?
We must assume that the population is
normal in order to continue with the "t" test.
Using your "t" tables from surfstat and entering the degree of freedom
as 5 and selecting the first graph ( the one showing a left tail rejection
region) we find the critical "t" value that corresponds
0.05 in the left wing of the curve to be -2.015. critical region under the t distribution is the region to
the left of t = -2.015.

Calculating the test statistic results in the following:

We reject the null hypothesis since the test statistic clearly lies in the reject region.
The following template will help you organize your work when testing a claim.
In each of the examples so far, the alpha level was selected in advance of testing. It is easy to see that the decision rendered depends on the alpha level selected. Often a null hypothesis is rejected at the 0.05 and we wonder just how far below 0.05 we could go and still reject the null hypothesis. Could we reject the null hypothesis if we wanted to be 99% confident that we will not make a type I error (e.g alpha =0.01). This situation brings up the question, What is the smallest alpha level at which the null hypothesis can be rejected? P values ( probabilities) are used extensively in statistics and and serve to answer the question of what is the smallest alpha for which the null hypothesis may be rejected.
Definition: Assuming that the null hypothesis is true, the p value is the probability of obtaining a sample mean as extreme or more extreme than the sample mean actually obtained.
A small p value associated with a sample mean indicates a rare occurrence and would lead one reject the null hypothesis.
The p value can also be thought of as the smallest alpha for which the null hypothesis can be rejected. If the p value is 0.03 then one can reject the null hypothesis with a 97% degree confidence level of not making a type I error. If the p value is 0.02 then one could reject the null hypothesis with a 98% degree confidence level against a type I error. On the other hand if the p value is 0.07 than the confidence level against a type I error is only be 93%, somewhat below the 95% confidence level.
Decision Rule Based on P Values
1. If p is less than or equal to alpha, then reject the null hypothesis
2. If p is greater than alpha, the fail to reject the null hypothesis.
How to calculate a p value
1. If using a left-tail test, p equals the area to left of test statistic.You may use any applet such as the McClellen or Surfstat Applet to determine the p value.
Hypothesis Testing Difference Between Two Means
(Paired difference or matched pairs, dependent samples)
A sampling method from two populations is considered independent when the subjects selected in one sample do
not specify how the subjects are selected in the other sample.
A sampling method from two populations is considered to be dependent when the subjects selected in one sample determine the subjects selected in the other sample. For example if we are studying the driving habits of husbands and wives, then once a wife is selected
for study, her husband is automatically selected for study.
We now consider an example where the subjects are involved in a pre-test, post-test situation.
Since the mean score of subjects on the pre-test will be compared to the mean score of
the same subjects on the post test, it is clear that this is a dependent testing
situation. The null hypothesis will be that the difference in the mean scores
for the two test will be zero or the mean score Before equals the mean score After

The alternative hypothesis will be one of the statements below.( Always think of subtracting the
After score from the Before score)

Next we turn our attention on how we compute the test value. First, for each data pair, compute the
difference between the before score and the after score (subtract the after score from the before score).
Once you have the difference for each pair of scores, sum these differences and
calculate the mean for this sum. This mean is the test value. You will also need
to calculate the standard deviation for these differences.
Had we carried out the above procedure a large number of times, we would have a distribution of sample means. This distribution of sample means has:
(a) a "t" distribution
(b) mean equal to zero.
(c) standard error equal to sample standard deviation divide by square root of sample size.
So, all that is left for you to do is to determine the reject region and convert the
test value to a "t" score and determine which region the "t" score lies in.
Example: A teacher wants to see if a power point presentation will change students understanding of a certain concept in biology. Five students were randomly selected for the study and were given a pre-test on their understanding of the biological concept, then the students viewed the power point presentation and afterwards were given a post-test on their understanding of the biological concept. The data are show below.
| Subject | Before Score | After Score | Difference |
| Bill | 22 | 35 | 13 |
| Joe | 18 | 18 | 0 |
| Jill | 20 | 18 | -2 |
| Tom | 20 | 41 | 21 |
| Sam | 25 | 35 | 10 |
The sum of the difference scores = 42. The mean difference score = 42/5 = 8.4
The standard deviation equals = 9.50
Converting the test value(8.4) to a "t" score: (8.4-0)/(9.5/sqrt(5)) = 1.98
Using a right tail test and alpha = 0.05 the critical "t" is: 2.13
Since
the test statistic (1.98) lies to the left of the critical "t" (2.13),
we fail to reject the null hypothesis that the difference between pre-test
score and post-test score is zero. We can find no evidence to conclude that the power point presentation makes a difference in understanding the biological concept.
Hypothesis Testing Difference Between Two Means
(Independent samples)
In this section you will be using a hypothesis test that states that there is no difference in
two population means. To perform a z test for the difference between two
population means requires that both samples be large
samples. If this is not the case then a "t" test is usually performed
provided both populations are normal.
If
both samples are large the sampling distribution for the difference between the
two means is normally distributed with mean and standard error given
below.
The test statistic is the difference between the two sample means converted to a z score.
Note: if the samples are large you can use the sample standard deviations instead of the population standard deviations provided the populations are normally distributed.
Example: An advertiser claims there is a difference in the median household income for card holders of Visa Gold and Gold Master Card. The result of a random sample of 100 customers from each card group are shown below:
Visa Gold Gold Master Card
The two samples are independent. Do the results support the advertisers claim? Use alpha = .05
H0 : U1 =U2 Ha: U1 not equal to U2
Because this is a two tailed test with alpha = .05 the critical Z's are -1.96 and +1.96. Therefore the critical regions are Z<-1.96 and Z>1.96.
The standard error is calculated to be: 1921 ( verify this)
The test statistic z = (60,900-64,300) - 0/1921 = -1.77
Because the test statistic is not in the critical region, you would fail to reject the null hypothesis.
If both samples are not large, and both populations are normally distributed and both population standard deviations are known, then you may use a "z" test.
However, if both samples are not large, and both populations are normally distributed, and both population standard deviations are not known, then you may use one of two "t" test that depends upon whether the population variances are equal or not. In either case the the standardized test statistic is given below.
There is a test for equal variances but will not be covered in this course. You will be provided with information regarding the relationship between the variances of the two populations.