How to Compute the Variance in Python using Numpy



Python


In this article, we show how to compute the variance in Python.

To compute the variance, we use the numpy module.

Variance measures how far a set of (random) numbers are spread out from their average value.

In Python, we can calculate the variance using the numpy module.

With numpy, the var() function calculates the variance for a given data set.

In the code below, we show how to calculate the variance for a data set.





So let's break down this code.

We import the numpy module as np. This means that we reference the numpy module with the keyword, np.

We then create a variable, dataset, which is equal to, [2,6,8,12,18,24,28,32]

We then get the variance of this data set by using the np.var() function. So instead of this np.var() function, we specify the variable, dataset.

We then print out the variance, which in this case is 105.4375.

So let's go over the formula for variance to see if this value calculated is correct.

So the formula for variance is, variance= (x-x)2/n=

So this means that in order to calculate the standard deviation, we must first calculate the mean of the data set. The mean in this case is, (2+6+8+12+18+24+28+32)/8= 130/8= 16.25

So we now take each x value and minus 16.25 from it.

This gives us, (2-16.25)= -14.25; (6-16.25)= -10.25; (8-16.25)= -8.25; (12-16.25)= -4.25; (18-16.25)= 1.75; (24-16.25)= 7.75; (28-16.25)= 11.75; (32-16.25)= 15.75.

We then square all of these numbers to get, -14.252+ -10.252 + -8.252 + -4.252 + 1.752 + 7.752 + 11.752 + 15.752= 203.0625 + 105.0625 + 68.0625 + 18.0625 + 3.0625 + 60.0625 + 138.0625 + 248.0625 = 843.5

We now take this value and divide it by n.

This gives us, 843.5/8= 105.4375

This value of 105.4375 is the variance.

So the numpy module is correct.

And this is how to compute the variance of a data set in Python using the numpy module.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...