learning central limit theorem
Today, A/B tests have become a fundamental tool to evaluate your idea in software improvement. It’s not very difficult to run A/B test on your idea. The problem is that cost is not cheap. I mean that we need real user’s activities and the amount of real user activities are always limited.
Back to the basics, why we have to collect a certain amount of data to evaluate our to see the hypothesis is correct. For example, does our new feature improve the percentage of customers who buy something on our site by at least 1%. How many samples of A and B are actually needed? We know real user data is very limited. The Less data we need, we would be happier.
Traditional statistical theory answers this question. You may have heard “central limit theorem”. Honestly, I’m not familiar with statistical theory. According to wikipedia,
In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed
Very roughly, it says we can trust the average value observed in A/B tests if we collect enough size of user activity samples. With large samples, we can give the size of maximum errors in our observation by using theory about normal distribution.
To understand intuitively what the theorem is saying, I wrote a simple python script to draw the probabilistic distributions which comes out from the average of samples from uniform distribution.
In the following video, we can see that the probabilistic distribution of average value of uniform distribution is converging into normal distribution as the sample size is increasing. (sample sizes are 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024).