The Central Limit Theorem

Prior to discussing the central limit theorem, we must understand the concept of sampling distributions and in particular the sampling distribution of sample means.

Suppose we have a population consisting of the numbers {1,2,3,4,5} and we randomly selected two numbers from the population and calculated their mean. For example, we might select the numbers 1 and 5 whose mean would be 3. Suppose we repeated this experiment (with replacement) many many times. We would have a collection of sample means ( millions of them). We could then construct a frequency distribution frequency distribution of these sample means. The resulting distribution of sample means is called the sampling distribution of sample means. Now, having the distribution of sample means we could proceed to calculate the mean of all sample means (grand mean) and their standard deviation ( called the standard error). 

  There are two important properties of the distribution of sample means


In Mathematical terms:


An Example: Suppose it is known that Georgians watch an average of 5 hours of TV per day. Also suppose it is known that the standard deviation for this group is 2 hours. If a random sample of 49 Georgians is selected and the mean of the sample is calculated, we can think of this mean as being one mean among a distribution of sample means. The mean of the distribution of sample means (the grand mean) would equal 5 hours (the population mean) and the standard deviation of the distribution would be 2 ( population std dev) divided by square root (49) e. g 2/49 = 0.0408. What we don't know yet is the shape of the distribution of sample means

The central limit theorem provides a third property of the sampling distribution of sample means. This third property concerns the shape of the distribution of sample means.

The Central Limit Theorem (CLT):
As the size of the sample increases, the sampling distribution of the mean approaches a normal distribution.

If the population from which samples are drawn is normally distributed then the sampling distribution of sample means will be normally distributed regardless of the size of the sample, and the CLT is not needed. But, if the population is not normal, the CLT tells us that the sampling distribution of sample means will be normal  provided the sample size is sufficiently large.

How large must the the sample size be so that the sampling distribution of the mean becomes a normal distribution? 

It is easy to become confused over the different type symboles used to represent means and standard deviations, the table below should help you avoid this confusion.


Three Practical Application of The Central Limit Theorem

 1. The weight of garbage collected in a residential section of a city are normally distributed with  a mean weight of 68 pounds and a standard deviation of  9 pounds. Find the probability that the garbage of a randomly selected customer weighs between   65 pounds and 71 pounds.

Solution: From the utilities section of the web page, open up the normal distribution by McClelland: Normal distribution by McClelland. Enter the following information:

mean = 68   standard deviation = 9/1(don't forget to divide by square root of sample size)), starting point = 65,  end point = 71

 

2.Adult males have waist sizes of mean 36 inches with a standard deviation of 2 inches. A random sample of  36 males is taken and their waist size is measured. What is the probability that the mean waist size of the sampled group will be between 38.5 and 40 inches?

Solution: Once again open up the McClelland normal calculator and enter the following data. We see that the probability is zero.

mean = 36   std dev = 2/6/=.3333 ( don't forget to divide by square root of sample size)  start point = 38.5 end point = 40




3. Using the same information from problem  # 2, but this time having a sample size of      25, what is the probability that the mean sample size of the group will be between     38.5 and 40  inches?

Solution: We cannot use the central limit theorem since the population distribution is unknown and the sample is a small sample.