How to Calculate Pearson's Correlation Coefficient-A Step-by-Step Guide

How to Calculate Pearson's Correlation Coefficient-A Step-by-Step Guide

Pearson’s Correlation Coefficient is a statistical measure used to determine the strength and direction of the linear relationship between two variables. It is widely used in data analysis to assess whether a relationship exists between variables like height and weight, age and income, or hours studied and exam scores.

In this guide, we’ll cover the steps to calculate Pearson’s Correlation Coefficient with examples. how-to-calculate-pearson's-correlation-coefficient.webp

What is Pearson’s Correlation Coefficient?

Pearson’s Correlation Coefficient (often denoted as r) ranges from -1 to 1:

  • r = 1: Perfect positive correlation (as one variable increases, the other increases).
  • r = -1: Perfect negative correlation (as one variable increases, the other decreases).
  • r = 0: No linear relationship between the two variables.

The formula for Pearson’s correlation coefficient is:

[ r = \frac{{n \sum xy - (\sum x)(\sum y)}}{{\sqrt{[n \sum x^2 - (\sum x)^2][n \sum y^2 - (\sum y)^2]}}} ]

Where:

  • n = number of data points
  • x and y = two variables
  • (\sum xy) = sum of the product of the paired scores
  • (\sum x) and (\sum y) = sum of the individual scores for variables (x) and (y)
  • (\sum x^2) and (\sum y^2) = sum of the squared scores for variables (x) and (y)

Example Data

Let’s calculate Pearson’s correlation coefficient for the following example dataset:

XY
12
23
35
47
511

Steps to Calculate Pearson’s Correlation Coefficient

how-to-calculate-pearson's-correlation-coefficient-2.webp

1. Calculate the Sums

Start by calculating the sums of (x), (y), (x^2), (y^2), and the product (xy).

[ \sum x = 1 + 2 + 3 + 4 + 5 = 15 ] [ \sum y = 2 + 3 + 5 + 7 + 11 = 28 ] [ \sum x^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 55 ] [ \sum y^2 = 2^2 + 3^2 + 5^2 + 7^2 + 11^2 = 208 ] [ \sum xy = (1 \cdot 2) + (2 \cdot 3) + (3 \cdot 5) + (4 \cdot 7) + (5 \cdot 11) = 99 ]

2. Insert the Values into the Formula

We now have all the values to calculate the correlation coefficient (r). Let’s apply them to the formula:

[ r = \frac{{5 \cdot 99 - (15 \cdot 28)}}{{\sqrt{[5 \cdot 55 - 15^2][5 \cdot 208 - 28^2]}}} ]

3. Simplify the Equation

Simplify the numerator:

[ 5 \cdot 99 = 495 ] [ 15 \cdot 28 = 420 ] [ 495 - 420 = 75 ]

Now, simplify the denominator:

[ 5 \cdot 55 = 275 ] [ 15^2 = 225 ] [ 275 - 225 = 50 ]

For (y):

[ 5 \cdot 208 = 1040 ] [ 28^2 = 784 ] [ 1040 - 784 = 256 ]

Now, compute the square root:

[ \sqrt{50 \cdot 256} = \sqrt{12800} = 113.14 ]

Finally, calculate (r):

[ r = \frac{75}{113.14} = 0.663 ]

4. Interpret the Result

The Pearson’s correlation coefficient (r = 0.663) indicates a moderate positive correlation between (x) and (y). This means that as (x) increases, (y) tends to increase as well, but the relationship is not perfectly linear. how-to-calculate-pearson's-correlation-coefficient-3.webp

Conclusion

Pearson’s correlation coefficient is a powerful tool for measuring the strength of a linear relationship between two variables. In this example, we found a moderate positive correlation. Understanding how to calculate and interpret this coefficient allows for deeper insights into relationships in your data.

To summarize:

  1. Calculate sums: (\sum x), (\sum y), (\sum x^2), (\sum y^2), and (\sum xy).
  2. Plug the values into the Pearson correlation formula.
  3. Simplify the expression to find (r).
  4. Interpret the result to understand the strength and direction of the relationship.