Skip to content

What is the Correlation Coefficient?

What is Correlation Coefficient?

Correlation Coefficient Correlation Type Example
0 No-Correlation a change in one parameter will not impact the other parameter.
-1 Perfectly Negative Correlation An increase in one parameter will decrease the other.
+1 Perfectly Positive Correlation An increase in one parameter will increase the other.

The correlation coefficient tells us the strength of the relationship between two variables. In other words, a correlation coefficient tells us how changing a variable will impact the other variable. Its value lies between -1.0 and +1.0.

Correlation Example

  • If there is no relation between a student’s performance and the type of diet it has. Performance and diet parameters are not correlated. 
  • The stock-A and stock-B are correlated if a change in one stock impacts another stock.

A correlation between two variables does not tell us cause and effect relation or if a change in one variable (variable-1) is due to a change in another (variable-2)

Application Example

  • We can use the Pearson coefficient to determine the relation between stocks and make investment decisions.
  • For statistical hypothesis testing and determine the relation between two variables.
  • We use correlation coefficient in machine learning for feature selection.

Types of Correlation Coefficient?

We can find the correlation coefficient between two variables using one of the following methods.

  1. Pearson Coefficient
  2. Spearman Coefficient

Pearson Correlation Coefficient

Pearson Coefficient is also known as Pearson’s r’, PPMCC. It represents the strength and direction of the linear relation between two quantitative variables (X and Y) on the same interval or ratio scale.

Linearity between two variables on a scatter plot represents the Pearson correlation coefficient.

This image shows Pearson correlation coefficient between two variables
Person Coefficient Relationship
.5 to 1 Strongly Positive
.3 to .5 Moderate Positive
0 to .3 Weak Positive
0 No Correlation
-.3 to 0 Weak Negative
-.5 to -.3 Moderate Negative
-1 to -.5 Strongly Negative

Prior Conditions to calculate the Pearson Correlation Coefficient

  1. Both variables are quantitative.
  2. Normal distribution of variables.
  3. Data is without outliers. 
  4. The relationship between both variables is linear.

Formula for Pearson Correlation Coefficient

This image shows Pearson Correlation coefficient formula

How to calculate Pearson Correlation Coefficient

We will do hands-on calculations to understand the impact of a person’s age on earnings. The person is working in the IT industry.

Step-1: Get the Data

Age Earnings in thousand of $
23 40
37 70
37 100
31 41
24 35
38 110

Step-2. Make sure data is meeting prior conditions for the Pearson coefficient

Step-3 Calculate the inputs

X y X2 y2 X*y
23 40 529 1600 920
37 70 1369 4900 2590
37 100 1369 10000 2700
31 41 961 1681 1271
24 35 576 1225 840
38 110 1444 2100 4180
Total 190 396 6248 31506 13501

Step-4 Place all values in the formula and calculate the coefficient

r = (6 * 13501-190 * 396) / √ ([6 * 6248 - 1902] * [6 * 31506 - 3962])

r=(81006-75240) / √ ([37488-36100] * [189036-156816])

r = 5766 / 6687.4 = 0.86

Step-5 Results Interpretation: Reporting the Pearson correlation coefficient in APA Style

High Correlation between a person’s earnings and age, r = .86

Note: “0” is not used while representing Pearson coefficient.

Significance Testing for Pearson Coefficient

Further, we can use the Pearson Correlation Coefficient to test whether the relationship between two variables is significant.

Pearson correlation of the sample (r) is an estimate for the Pearson correlation of population rho (p).

Problem Statement: We can use Sample Pearson Correlation to infer if the population Pearson correlation is significantly different from zero.

Null hypothesis: p=0

Alternative hypothesis:  p ≠ 0

This image shows calculation Formula for One Sample t-test
t = 0.86 / √ ([ 1 - 0.862 ] / [6-2]) = 0.86 / 0.255 = 3.372

Step 2: Find the critical t-value

Degrees of freedom (for Pearson correlation tests) = n – 2 = 4

Significance level (alpha) = 0.05

Critical t-value for two-tailed test = 2.78

Step 3: Compare the t value to the critical value

If the t-statistic > critical t-value: 

The relationship is statistically significant (p < alpha). We can reject the null hypothesis.

If the t-statistic > critical t-value:

The relationship is not statistically significant (p > alpha). We Can’t reject the null hypothesis.

Since 3.372 (t-value) > 2.78 (critical t-value) =>

We can reject the null hypothesis that the Pearson correlation coefficient of the population (p) is not 0. In other words, age and earnings have a significant relationship (p > 0.05).

A high correlation between a person’s earnings and age, r = 0.86, and the relationship between the two variables is also statistically significant.

Spearman Correlation Coefficient

Spearman correlation Coefficient is a nonparametric measure of correlation. We represent Spearman correlation coefficient by ‘p'(rho). It measures the strength and direction of the relation between two ranked variables.

Spearman correlation measures the strength of the linear relation between paired data. Its value lies between -1 and + 1.

Prior Conditions for Spearman Correlation Coefficient

  • Both variables are ordinal.
  • Variable data is not normally distributed.
  • The relationship between variables is non-linear and monotonic.
  • Data may have outliers.
What is the monotonic relationship?

A monotonic function never decreases or increases as the independent variable changes. In other words, when the x variable increases, the y variable increases and decreases.

Spearman Correlation Coefficient Calculation Formula

Hands-on Calculations for Spearman Correlation coefficient

We will do Spearman Correlation hands-on calculations to understand the correlation between students’ math and science marks.

Step-1: Get the Data

Student Math Rank Science Rank
1 5 4
2 10 9
3 2 1
4 9 10
5 1 2
6 7 5
7 3 6
8 8 7
9 4 3
10 6 8

Step-2: Calculate d and d2

Student Math Rank Science Rank d d2
1 5 4 1 1
2 10 9 1 1
3 2 1 1 1
4 9 10 1 1
5 1 2 1 1
6 7 5 2 4
7 3 6 3 9
8 8 7 1 1
9 4 3 1 1
10 6 8 2 4

Step-3:Calculate the Spearman's Rank correlation coefficient

Spearman Correlation Coefficient= 1 - [ (6 * 24) / (10 * (10 2 - 1)]
=1 - [144 / 990]
=1 - 0.145 = 0.8545
Spearman's Rank Correlation is 0.8545.

Step-4: Results Interpretation

High correlation (.8545) between a student math and science rank is observed.

Note: “0” is not used while representing Pearson coefficient.

Leave a Reply

Your email address will not be published. Required fields are marked *