top of page

Central Limit Theorem with Python

You know what's a slap on the face? When theoretical models don't work well in real life and that happens almost every time. That's when I realized it had to be tested with the central limit theorem.


What is central limit theorem?


The central limit theorem is a statistical theorem that states that, given a sufficiently large sample size from a population with a finite level of variance, the distribution of the sample means will be approximately normally distributed.


In other words, the theorem states that the mean of a large sample of observations drawn from any distribution with a finite mean and variance will tend to be normally distributed. This holds true regardless of the shape of the original distribution, and is a key result in probability theory and statistics.


In simple terms....


Say, I have 100 samples with me. I find the mean of each sample. Then I plot the sample means on the X axis and the frequency or the number of times that particular mean value occurred on the Y axis. The stats wizards claim that our plot will look approximately normal.


Taking the most 'ab-normal' distribution ever


Here is a random sample distribution with 1000 data points.



num=[]
for i in range(1,1001):
    k=random.randint(200,400)
    num.append(k)
len(num)

Out[4]:1000

We then find the mean and standard deviation just for the heck of it.





Then we plot the dataset:



sns.histplot(num)
plt.title("Plot Distribution")
plt.show()




Clearly not a normal distribution!


Now, taking sample size as 1000 with each sample having more than 30 values, we plot the mean and frequency to get a plot like this:



sns.displot(means,kde=True)
plt.show()


Alright, it's kind of normal but why is that bar taller than the peak?


I'll give you one more time to prove yourself 'Central Limit theorem'!


So, this time I take 10,000 samples and if you are trying this as well, just pray that your computer doesn't crash because mine didn't work for an hour after this :)



That's way more 'normal' than the first one. Okay, I can say it works but what if I just increase it to 20,000?


That's normal, alright. Hence, proved? Can't say no.



Conclusion


It works! CTL isn't just another mathematical model that exists for no reason. One might argue that models are imperfect and they are just trying to mimic the real world based on assumptions. I do not agree with it all the time. Just because there's a pattern, doesn't mean you force fit a model and make it seem like you found the panacea. I do think there are plenty of models that just exist for no reason at all and we end up studying it because it sounds intellectually 'cool'? It's always been about theory vs practice and how one can't live without the other... or can they?





Comments


bottom of page