Confidence Intervals

This section is concerned with estimating a population mean. We could estimate a population mean by drawing a random sample from the population and use the mean of the sample as our estimate for the mean of the population. We don't expect the sample mean to be exactly equal to the population mean because the sample utilizes only a small fraction of the information contained in the population. So this method of estimating a population mean leaves us with a great deal of uncertainty because we wonder just how close our sample mean is to the population mean. A better idea is to construct an interval centered around the sample mean in which we are fairly confident  contains the population mean. Suppose our sample mean was 48, a significant amount of uncertainty could be remove if we could say that we are are 95% confident that the population mean lies in the interval 48, plus or minus 3 units. That is, we are 95% confident that the population mean lies in the interval  [45,  51]. An interval such as this is called a 95% confidence interval (CI) estimate of the population mean. In practice, statisticians usually construct 90% , 95%, or 99%  confidence intervals for estimating a population mean.

Constructing a 95% Confidence Interval For a Population Mean

The central limit theorem tells us that for large samples the distribution of sample means takes on the shape of a normal curve. The empirical rule for normal curves tells us that approximately 95% of all sample means l lie within 2 standard deviations of the grand mean or what is the same thing, the population mean (actually it is 1.96). Thus 95 % of all sample means will lie in the interval:



By applying an algebraic transformation, we can write the inequality as:


Don't worry if you cannot follow this.

The part of the inequality to the right of All Sample means is called the margin of error, E.
                                 E =

The 95% confidence interval for a population mean should upper and lower bounds given by:

 

 

 



For a 90% confidence interval estimation of the population mean, replace 1.96 with 1.645. For a 99% confidence interval use 2.575.

Often we don not know the population standard deviation (sigma). We can substitute the sample standard (s) deviation in place of the population standard deviation provided we select a large (30 or greater) sample.

Below are the steps required to make a 95% confidence interval estimation of a population mean.

Constructing a Large Sample 95% confidence Interval for a population mean

Upper bound of the confidence interval:
870 + (1.96)(12)/ sqrt root(100)
870 + (1.96)(12)/10
870 + 23.52/10
870 + 2.35
872.4
Lower bound for confidence Interval:
870 - (1.96)(12)/ sqrt root(100)
870 - (1.96)(12)/10
870 - 23.52/10
870 - 2.35
867.7
Thus, we are 95% confident that the combined SAT mean score for Bainbridge College Students is in the interval [867.7, 872.4]

What do we mean when we say we are 95% confident that the population mean lies in a certain interval? We of course cannot be certain that the resulting interval contains the population mean, but our confidence is derived from the fact if we had repeated this experiment a very large number of times then 95% of all the resulting 95% confidence intervals will contain the population mean and 5% of all the 95% confidence intervals will not contain the population mean. We don't know whether our confidence interval of [867.7, 972.4] is in the 95% group that contains the mean or in the 5% that group that does not contain the mean. It is incorrect to say that there is a 95 % probability that the actual mean is in the interval [867.7,972.4]. The correct way to interpret our confidence interval is " there is a 95% probability that the confidence interval [867.7,972.4] contains the population mean.

Constructing a Small Sample 95% confidence Interval for a population mean

In the previous section we were able to compute confidence intervals even when the population standard deviation was unknown by selecting a large sample and substituting the sample standard deviation for the population standard deviation. In most real-life situations the population standard deviation is unknown. In addition,  we are unable to select large samples due to circumstances. For example, a doctor may only have 6 patients available for a study or there may only be 10 skeletal remains of some extinct animal for study.  If we must  select a small sample and  have no knowledge of the the population standard deviation, then we must use a distribution called a "students t-distribution" or simply a "t-distribution" provide the population is essentially normal. The students t-distribution was discovered by William H. Gossett (1876-1937) at age 23. Gossett worked for the Guinness Brewing Company in Dublin, Ireland. The brewing company frowned upon employees publishing their research and so Gossett published under the pen name student.

Characteristics of the t-Distribution