How to Get Percentiles in Python with the Numpy Module



Python


In this article, we show how to get percentiles data in Python with the numpy module.

So let's say we have a data set composed up several different values. And we get the value of the 50th percentile. Or we want to get the value of the 75th percentile. Or we want to get the 90th percentile.

What this means is, say we have a data set of people's salaries. The value in the middle, the median value, is $40,000 a year in salary. So 50% of people make $40,000 a year. Now let's say we want go down 3/4 of the way and now at the 75th percentile, people are earning $90,000. What this means is only 25% of people earn this amount or higher. Let's now go to the 99th percnetile; this is now the 1 percenters. Let's say the 1 percenters earn $600,000 in annual salary. This means only top 1% of people make this amount or higher.

Using the numpy module in Python, we can find the data for any percentile between 0 and 100.

So below we create a data set with a normal distribution with a mean centered at the value of 20 and a standard deviation of 2.5. We create 10,000 random data points. We then get various percentiles from the data.





So let's break down this code.

We import the numpy module as np. This means that we reference the numpy module with the keyword, np.

So we create a variable, values, and assign it to, np.random.normal(20,2.5,10000)

What this line does is it creates a data set with a mean centered at the value of 20 with a standard deviation of 2.5. We create 10,000 random data points in this data set. So we have a lot of data. Remember that this is normalized data.

We then use the np.percentile() function to find the value for a given percentile.

First, we look at the percentile for the 10th of data. This means that the lowest 10% of values belong to this group (and 90% at or more than this value). This 10th percentile value is 16.780774503639915.

The 50th percentile has a value of 19.939851436401284. This means that 50% of the values are under this level and 50% are at or above this level.

The 90th percentile has a value of 19.939851436401284.

The 99th percentile has a value of 25.633231120341421.

And this is how you can get valuable percentiles data in Python with the numpy module.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...