Independent two sample t-test is a type of t-test that is used in machine learning for hypothesis testing. This article covers one sample t-test with examples and how we can implement this in Python.
Table of Contents
ToggleWhat is independent two sample t-test?
An Independent two-sample t-test is a statistical test used to determine whether there is a difference between two unrelated groups.
We use a t-test for independent samples when we need inference on population based on two Independent samples. We compare the means of the two groups during independent samples t-test. Two groups are considered different if the difference in their mean is high.
Application Examples of unpaired t-test
Application Example in Sociology
We can use an independent two-sample t-test to determine if the mean weight of two groups (male and female) is statistically different. We can make decisions based on sample data using a t-test. There is no need to collect complete population data.
Application Example in Manufacturing
As a plastic part manufacturer, you can use a t-test to determine if the product weight manufactured with two different machine settings is different. We can weigh samples from each machines to calculate the t-value.
How independent two sample t-test work?
Independent two sample t-test compares the difference in the mean of two samples to the standard error of the mean. The standard error of mean indicates how much the sample mean can deviate from the population mean.
It is more likely the difference in the mean of two samples is by chance if:
- The difference in the means of the two groups is high.
- The standard error of the mean is high.
It is less likely the difference in the mean of two samples is due to chance if:
- The difference in the means of the two groups is high.
- The standard error of the mean is small.
Prior Requirements and Assumptions for Unpaired T-test
You need to make sure prior requirements of data are met for unpaired t-test.
- Samples from two groups are independent (Data values in one sample must not influence the values in another sample). For example, Males in Group 1 and Females in Group 2 are two independent groups.
- Data in both groups is normally distributed. We can check this using the Shapiro-Wilk Test.
- The variance within each group is similar. We can do Levene’s Test for Equality of variance.
Independent two sample t-test Formula for Equal Variance
Independent t-test Formula for Un-equal Variance
Steps to calculate the t-value for Independent sample t-test
We will try to understand the steps to calculate t-value for independent two sample t-test using plastic part manufacturing process comparison. We will statistically prove if the part weight manufactured using machine-1 is different from parts manufactured from machine-2.
Step-1: Get the Data
Plastic part Weight in Grams from Machine-1
1 | 15 |
---|---|
2 | 15.01 |
3 | 15.1 |
4 | 14.9 |
5 | 14.95 |
6 | 15 |
7 | 15.15 |
8 | 15 |
9 | 15.01 |
10 | 14.8 |
11 | 14.9 |
12 | 14.95 |
13 | 15.12 |
14 | 15 |
15 | 14.95 |
Plastic part Weight in Grams from Machine-2
1 | 15.02 |
---|---|
2 | 15.01 |
3 | 15.1 |
4 | 15.05 |
5 | 15.1 |
6 | 15 |
7 | 15.15 |
8 | 15 |
9 | 15.01 |
10 | 15.1 |
11 | 15.15 |
12 | 15 |
13 | 15.15 |
Step 2: Ensure Data is meeting the Prior Conditions
Step 3: Write down Null and Alternate Hypothesis
Null Hypothesis
There is no difference in part weight when parts are manufactured from different machines.
Alternate Hypothesis
Part weight is different when manufactured from two different machines.
Step 4: Finalize the Significance Value
After discussion with the internal team, you finalize sigma value as 5% or 0.05.
Step 5: Calculate the t-value for unpaired t-test
Group | Sample Size | Sample Mean | Variance | Standard Deviation |
---|---|---|---|---|
Machine-1 | 15 | 14.989 | 0.00806 | 0.0897 |
Machine-2 | 13 | 15.0646 | 0.00386 | 0.06213 |
Degree of Freedom = 15 + 13 -2 = 26
Sp = √ [ { (15-1)*0.0897 2 + (13-1)*0.06213 2} / (15+13-2) ]
Sp = √ [ { 14*0.008046 + 12*0.003860} / 26 ] = √ (0.006114) = 0.0782
Standard Deviation of Mean Value Difference = Sp * √ (1/n1 + 1/n2)
= 0.0782 * √ (0.0666+0.0769) = 0.02963
Step 6: Find the Critical t-value from t-table
You can refer to the t-table to get the t-value from DOF = 26, significance level = 0.05, and a two-tailed test.
Critical t-value = 2.056
Step-7: Results Evaluation
We can reject the null-hypothesis because the calculated t-score or test statistic is greater than critical t-value. In other words, null hypothesis is not true, or the manufactured part weight from two machines is not equal.
Python code to implement Independent Two-sample t-test
We will implement the two-sample independent t-test on sample data using the Python SciPy Library.
# Import important Library
import pandas as pd
import numpy as np
# Get the sample Weight data
sample_weight_mac1 = [15, 15.01, 15.1, 14.9, 14.95, 15, 15.15, 15.01, 15, 14.8, 14.9, 14.95, 15.12, 14.95, 15, 14.95]
sample_weight_mac2 = [15.02, 15.01, 15.1, 15.05, 15.1, 15, 15.15, 15, 15.01, 15.1, 15.15, 15, 15.15]
Define Null and Alternate Hypothesis
Null Hypothesis: There is no difference in part weight when parts are manufactured from different machines.
Alternative Hypothesis: Part weight is different when manufactured from two different machines.
# Significance level (alpha)
alpha = 0.05
# Degrees of freedom
dof = len(sample_weight_mac1) + len(sample_weight_mac1) - 2
Python code to calculate t-statistic for two sample independent t-test
# Calculate the t-statistic
import scipy.stats as stats
# Perform independent t-test
t_statistic, p_value = stats.ttest_ind(sample_weight_mac1, sample_weight_mac2)
# Print the results
print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')
T-statistic: -2.698970434744361 P-value: 0.011850160446312454
Calculate the critical t-value
# Calculating Critical t-value: We can find this value for two tailed test using t-table as well
critical_t_value = stats.t.ppf(1 - alpha / 2, df=dof)
print("Critical T-Value:", critical_t_value)
Critical T-Value: 2.0422724563012373
Results Interpretation
if abs(t_statistic) > critical_t_value:
print("Results: We can Reject the null hypothesis.")
else:
print("Results: Fail to reject the null hypothesis.")
Results: We can Reject the null hypothesis.
FAQ on Independent t-test
Independent t-test and A/B testing are related concepts, but they are not exactly the same thing.
We can use Independent t-test in following conditions:
- Need to compare two groups.
- Sample data from two groups are independent.
- Data in both groups is normally distributed.
- The variance within each group is similar.
We can use independent t-test to compare two groups. But we can use ANOVA test to compare multiple groups.
Yes, You need to use formula for two sample t-test with unequal variance.