Often we are interested in measuring the strength and direction of a linear relationship between two variables x and y. If when x increases y also increases and when x decreases y also decreases we say the direction is positive. However, when x increases and y decreases or when y increases and x decreases, we say the direction is negative. We measure the strength and direction of the relationship between two variables by calculating a number between + or - 1 called the coefficient of correlation r. (e.g. (-1< r < 1). The correlation coefficient, r measures how close to a straight line the set of points (x, y) would fall if plotted. The closer to zero of r, the less the points fall along a straight line.


1. Calculate the mean of x and the mean of y.
2. Calculate the standard deviation of x and the standard deviation of y
3. Calculate the covariance between x and y
4. Calculate the correlation coefficient

1. mean of x = 3 mean of y = 2
2.std dev x = .633 std dev y = .633
3. covariance =(2-3)(2-2)+(3-3)(3-2)+(3-3)(2-2)+(3-3)(1-2)+(4-3)(2-2)/4 =0
4. r = 0/(.633)(.633) =0
The Question of Causation
A strong correlation between two variables does not necessarily mean that changes in one variable cause changes in the other variable. The is a strong positive correlation between the number of priest in Boston and the number of murders in Boston. Does this mean that priest are going around murdering people in Boston? Of course not. This is sometimes called a "nonsense correlation". There is however a high positive correlation between smoking and cancer. Does this mean that smoking causes cancer? The government and medical science think so. What do you think?
The picture below shows in outline form how a variety of underlying links between variables can explain association.

Correlation Matrix
Often several variables are involved in a study and we would like to show all mutual correlations. A convenient way of summarizing a large number of correlation coefficients is to arrange the coefficients in a rectangular array called a correlation matrix. The matrix below shows the mutual correlations of four variables, X1, X2, X3,and X4.
One can see from the table that each variable has a perfect positive correlation of 1 with itself (the left to right diagonal). The correlation coefficient between X1 and X3 is 0.9239 while the correlation coefficient between X4 and X3 is -0.9009.