If you are working in web optimization, the chances are you might have a tool that does all the statistical significance calculations for you. However you will most likely face situations where you are being asked what the impact of an experiment is on metrics down the funnel such as revenue.
Going beyond the A/B test platform you love and leverage your data wharehouse to calculate the said impact is super important. I am not going to show you how to extract the right data from your data warehouse and just assume you have what you need. Once you have the requested data, the next step is to run the appropriate statistical test.
In this present case I will show you how to run a t-test in Python. It helps you compare two continuous variables such as revenue and determines wether the difference between their means is statistically different. Let’s do it!
Let’s import some packages
# Import
import pandas as pd
import numpy as np
from scipy import stats
We create a dataset to play with
The t-test will require the mean, standard deviation and size for both the control and the variant. This is what I am emulating below:
# Let's manually create some data
control = {"Mean": 33.0,
"Stdv": 5.0,
"Size": 2390}
variant = {"Mean": 32.7,
"Stdv": 3.4,
"Size": 2245}
Let’s run the t-test!
There are several packages out there that can help you compute a t-test, here I am simply going to use the ttest ind from stats function from scipy.stats:
result = stats.ttest_ind_from_stats(control["Mean"],
control["Stdv"],
control["Size"],
variant["Mean"],
variant["Stdv"],
variant["Size"],
equal_var=False)
print("T statistic: {} \nP-value: {}".format(result[0], result[1]))
T statistic: 2.401193475685192
P-value: 0.016384605058160074
Here is a bonus, below is an implementation of the t-test function that gives you an idea of what is going on under the hood:
from scipy.stats import t
def ttest(control, variant):
# Assigning the input to variables
mean_c = control["Mean"]
sigma_c = control["Stdv"]
size_c = control["Size"]
mean_v = variant["Mean"]
sigma_v = variant["Stdv"]
size_v = variant["Size"]
# Compute the t-value
ttest = (mean_c - mean_v) / np.sqrt((np.power(sigma_v,2)/size_v) + (np.power(sigma_c,2)/size_c))
# Degrees of Freedom
df = np.power((np.power(sigma_c,2) / size_c) + (np.power(sigma_v,2) / size_v), 2) / (np.power((np.power(sigma_c,2) / size_c), 2) / (size_c - 1) + np.power((np.power(sigma_v,2) / size_v), 2) / (size_v - 1))
pvalue = (1 - t.cdf(np.absolute(ttest), df)) * 2
return [ttest, pvalue]
print("Statistic: {}\nP-Value: {}".format(ttest(control, variant)[0], ttest(control, variant)[1]))
Statistic: 2.401193475685192
P-Value: 0.016384605058160195
We obtain the same result as before!
Finally let’s see how we can calculate confidence intervals. First we need to compute the degrees of freedom so we can find the t-critical value needed to define the lower and upper bounds of our interval.
mean_c = control["Mean"]
sigma_c = control["Stdv"]
size_c = control["Size"]
mean_v = variant["Mean"]
sigma_v = variant["Stdv"]
size_v = variant["Size"]
# We need to compute the degrees of freedom first
df = np.power((np.power(sigma_c,2) / size_c) + (np.power(sigma_v,2) / size_v), 2) / (np.power((np.power(sigma_c,2) / size_c), 2) / (size_c - 1) + np.power((np.power(sigma_v,2) / size_v), 2) / (size_v - 1))
scipy.stats will give us the t critical value we need. We want a 95% confidence interval and thus the reason why we are looking at a t-critical value corresponding to 0.975 for the probability distribution: in other words we want the “central” 95% when we look at the area under the curve and not include what is above 0.975% or below 0.025%.
import scipy.stats as st
tscore = st.t.ppf(0.975, df)
Finally we can calculate the lower and upper bounds of our confidence interval.
Lower = np.abs(mean_c - mean_v) - tscore * np.power((np.power(sigma_c, 2) / size_c) + (np.power(sigma_v, 2) / size_v), 0.5)
Upper = np.abs(mean_c - mean_v) + tscore * np.power((np.power(sigma_c, 2) / size_c) + (np.power(sigma_v, 2) / size_v), 0.5)
print("tscore: {}".format(tscore))
print("Lower - Upper: {} - {}".format(Lower, Upper))
tscore: 1.960525101212849
Lower - Upper: 0.05505616839308525 - 0.5449438316069091