How to Calculate Pearson's Correlation Coefficient-A Step-by-Step Guide
- Emily Carter
- 20 Oct, 2024
Pearson’s Correlation Coefficient is a statistical measure used to determine the strength and direction of the linear relationship between two variables. It is widely used in data analysis to assess whether a relationship exists between variables like height and weight, age and income, or hours studied and exam scores.
In this guide, we’ll cover the steps to calculate Pearson’s Correlation Coefficient with examples.
What is Pearson’s Correlation Coefficient?
Pearson’s Correlation Coefficient (often denoted as r) ranges from -1 to 1:
- r = 1: Perfect positive correlation (as one variable increases, the other increases).
- r = -1: Perfect negative correlation (as one variable increases, the other decreases).
- r = 0: No linear relationship between the two variables.
The formula for Pearson’s correlation coefficient is:
[ r = \frac{{n \sum xy - (\sum x)(\sum y)}}{{\sqrt{[n \sum x^2 - (\sum x)^2][n \sum y^2 - (\sum y)^2]}}} ]
Where:
- n = number of data points
- x and y = two variables
- (\sum xy) = sum of the product of the paired scores
- (\sum x) and (\sum y) = sum of the individual scores for variables (x) and (y)
- (\sum x^2) and (\sum y^2) = sum of the squared scores for variables (x) and (y)
Example Data
Let’s calculate Pearson’s correlation coefficient for the following example dataset:
X | Y |
---|---|
1 | 2 |
2 | 3 |
3 | 5 |
4 | 7 |
5 | 11 |
Steps to Calculate Pearson’s Correlation Coefficient
1. Calculate the Sums
Start by calculating the sums of (x), (y), (x^2), (y^2), and the product (xy).
[ \sum x = 1 + 2 + 3 + 4 + 5 = 15 ] [ \sum y = 2 + 3 + 5 + 7 + 11 = 28 ] [ \sum x^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 55 ] [ \sum y^2 = 2^2 + 3^2 + 5^2 + 7^2 + 11^2 = 208 ] [ \sum xy = (1 \cdot 2) + (2 \cdot 3) + (3 \cdot 5) + (4 \cdot 7) + (5 \cdot 11) = 99 ]
2. Insert the Values into the Formula
We now have all the values to calculate the correlation coefficient (r). Let’s apply them to the formula:
[ r = \frac{{5 \cdot 99 - (15 \cdot 28)}}{{\sqrt{[5 \cdot 55 - 15^2][5 \cdot 208 - 28^2]}}} ]
3. Simplify the Equation
Simplify the numerator:
[ 5 \cdot 99 = 495 ] [ 15 \cdot 28 = 420 ] [ 495 - 420 = 75 ]
Now, simplify the denominator:
[ 5 \cdot 55 = 275 ] [ 15^2 = 225 ] [ 275 - 225 = 50 ]
For (y):
[ 5 \cdot 208 = 1040 ] [ 28^2 = 784 ] [ 1040 - 784 = 256 ]
Now, compute the square root:
[ \sqrt{50 \cdot 256} = \sqrt{12800} = 113.14 ]
Finally, calculate (r):
[ r = \frac{75}{113.14} = 0.663 ]
4. Interpret the Result
The Pearson’s correlation coefficient (r = 0.663) indicates a moderate positive correlation between (x) and (y). This means that as (x) increases, (y) tends to increase as well, but the relationship is not perfectly linear.
Conclusion
Pearson’s correlation coefficient is a powerful tool for measuring the strength of a linear relationship between two variables. In this example, we found a moderate positive correlation. Understanding how to calculate and interpret this coefficient allows for deeper insights into relationships in your data.
To summarize:
- Calculate sums: (\sum x), (\sum y), (\sum x^2), (\sum y^2), and (\sum xy).
- Plug the values into the Pearson correlation formula.
- Simplify the expression to find (r).
- Interpret the result to understand the strength and direction of the relationship.