Skip to content

Independent Two Sample t-test​ for Hypothesis Testing

Independent two sample t-test is a type of t-test that is used in machine learning for hypothesis testing. This article covers one sample t-test with examples and how we can implement this in Python.

What is independent two sample t-test?

An Independent two-sample t-test is a statistical test used to determine whether there is a difference between two unrelated groups.

This image shows Independent two sample t-test representation

We use a t-test for independent samples when we need inference on population based on two Independent samples. We compare the means of the two groups during independent samples t-test. Two groups are considered different if the difference in their mean is high.

Application Examples of unpaired t-test

Application Example in Sociology

We can use an independent two-sample t-test to determine if the mean weight of two groups (male and female) is statistically different. We can make decisions based on sample data using a t-test. There is no need to collect complete population data.

Application Example in Manufacturing

As a plastic part manufacturer, you can use a t-test to determine if the product weight manufactured with two different machine settings is different. We can weigh samples from each machines to calculate the t-value.

How independent two sample t-test work?

Independent two sample t-test compares the difference in the mean of two samples to the standard error of the mean. The standard error of mean indicates how much the sample mean can deviate from the population mean.

It is more likely the difference in the mean of two samples is by chance if:

  • The difference in the means of the two groups is high.
  • The standard error of the mean is high.

It is less likely the difference in the mean of two samples is due to chance if:

  • The difference in the means of the two groups is high.
  • The standard error of the mean is small.

Prior Requirements and Assumptions for Unpaired T-test

You need to make sure prior requirements of data are met for unpaired t-test.

  1. Samples from two groups are independent (Data values in one sample must not influence the values in another sample). For example, Males in Group 1 and Females in Group 2 are two independent groups.
  2. Data in both groups is normally distributed. We can check this using the Shapiro-Wilk Test.
  3. The variance within each group is similar. We can do Levene’s Test for Equality of variance.

Independent two sample t-test Formula for Equal Variance

Independent t-test Formula for Un-equal Variance​

Steps to calculate the t-value for Independent sample t-test​

We will try to understand the steps to calculate t-value for independent two sample t-test using plastic part manufacturing process comparison. We will statistically prove if the part weight manufactured using machine-1 is different from parts manufactured from machine-2.

Step-1: Get the Data

Plastic part Weight in Grams from Machine-1

1 15
2 15.01
3 15.1
4 14.9
5 14.95
6 15
7 15.15
8 15
9 15.01
10 14.8
11 14.9
12 14.95
13 15.12
14 15
15 14.95

Plastic part Weight in Grams from Machine-2

1 15.02
2 15.01
3 15.1
4 15.05
5 15.1
6 15
7 15.15
8 15
9 15.01
10 15.1
11 15.15
12 15
13 15.15

Step 2: Ensure Data is meeting the Prior Conditions

Step 3: Write down Null and Alternate Hypothesis

Null Hypothesis

There is no difference in part weight when parts are manufactured from different machines.

Alternate Hypothesis

Part weight is different when manufactured from two different machines.

Step 4: Finalize the Significance Value

After discussion with the internal team, you finalize sigma value as 5% or 0.05.

Step 5: Calculate the t-value for unpaired t-test

Group Sample Size Sample Mean Variance Standard Deviation
Machine-1 15 14.989 0.00806 0.0897
Machine-2 13 15.0646 0.00386 0.06213
This is a condition where both groups have equal variance.

The difference in the Mean of Two Groups = 14.989 - 15.0646 = 0.0756
Degree of Freedom = 15 + 13 -2 = 26
Sp = √ [ { (15-1)*0.0897 2 + (13-1)*0.06213 2} / (15+13-2) ]
Sp = √ [ { 14*0.008046 + 12*0.003860} / 26 ] = √ (0.006114) = 0.0782

Standard Deviation of Mean Value Difference = Sp * √ (1/n1 + 1/n2)
= 0.0782 * √ (0.0666+0.0769) = 0.02963

t-value (test statistic) = 0.0756 / 0.02963 = 2.617

Step 6: Find the Critical t-value from t-table

You can refer to the t-table to get the t-value from DOF = 26, significance level = 0.05, and a two-tailed test.

Critical t-value = 2.056

Step-7: Results Evaluation

We can reject the null-hypothesis because the calculated t-score or test statistic is greater than critical t-value. In other words, null hypothesis is not true, or the manufactured part weight from two machines is not equal.

Python code to implement Independent Two-sample t-test

We will implement the two-sample independent t-test on sample data using the Python SciPy Library.

# Import important Library
import pandas as pd
import numpy as np
# Get the sample Weight data
sample_weight_mac1 = [15, 15.01, 15.1, 14.9, 14.95, 15, 15.15, 15.01, 15, 14.8, 14.9, 14.95, 15.12, 14.95, 15, 14.95]

sample_weight_mac2 = [15.02, 15.01, 15.1, 15.05, 15.1, 15, 15.15, 15, 15.01, 15.1, 15.15, 15, 15.15]

Define Null and Alternate Hypothesis

Null Hypothesis: There is no difference in part weight when parts are manufactured from different machines.

Alternative Hypothesis: Part weight is different when manufactured from two different machines.

# Significance level (alpha)
alpha = 0.05

# Degrees of freedom
dof = len(sample_weight_mac1) + len(sample_weight_mac1) - 2

Python code to calculate t-statistic for two sample independent t-test

# Calculate the t-statistic

import scipy.stats as stats

# Perform independent t-test
t_statistic, p_value = stats.ttest_ind(sample_weight_mac1, sample_weight_mac2)

# Print the results
print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')
T-statistic: -2.698970434744361
P-value: 0.011850160446312454

Calculate the critical t-value

# Calculating Critical t-value: We can find this value for two tailed test using t-table as well
critical_t_value = stats.t.ppf(1 - alpha / 2, df=dof) 
print("Critical T-Value:", critical_t_value)
Critical T-Value: 2.0422724563012373

Results Interpretation

if abs(t_statistic) > critical_t_value:
    print("Results: We can Reject the null hypothesis.")
else:
    print("Results: Fail to reject the null hypothesis.")
Results: We can Reject the null hypothesis.

FAQ on Independent t-test

Independent t-test and A/B testing are related concepts, but they are not exactly the same thing.

We can use Independent t-test in following conditions:

  1. Need to compare two groups.
  2. Sample data from two groups are independent. 
  3. Data in both groups is normally distributed.
  4. The variance within each group is similar.

We can use independent t-test to compare two groups. But we can use ANOVA test to compare multiple groups.

Yes, You need to use formula for two sample t-test with unequal variance.

Leave a Reply

Your email address will not be published. Required fields are marked *