Quickly Compute Percentiles

python

More often than not, quickly computing some percentiles to understand a distribution can save some time. Here is a very quick (and dirty) way to do so, that I use all the time.

Let’s import some packages

# Import
import pandas as pd
import numpy as np

We create a dataset to play with

# Here I am loading the Iris Dataset saved locally on my computer
df = pd.read_csv('Iris.csv')
df.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Let’s get some percentiles & statistics

In this example we will compute various distribution statistics for the petal width.

print("Total count of flowers: {}".format(len(df['species'])))
print("Average petal width: %f" % np.average(df['petal_width']))
print("Petal width - 25th Perc: %f" % np.percentile(df['petal_width'], 25))
print("Petal width - Median: %f" % np.percentile(df['petal_width'], 50))
print("Petal width - 90th: %f" % np.percentile(df['petal_width'], 90))
print("Petal width - 95th: %f" % np.percentile(df['petal_width'], 99))
Total count of flowers: 150
Average petal width: 1.198667
Petal width - 25th Perc: 0.300000
Petal width - Median: 1.300000
Petal width - 90th: 2.200000
Petal width - 95th: 2.500000

Et voila!

Share this post: