Create a Correlation Matrix

python

Correlation matrices are very handy to understand at a glance the relationships between the variables that are contained in a DataFrame. I will show you in this post how to build one. Let’s have a look.

Let’s import some packages

# Import
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

We create a dataset to play with

# Here I am loading the Iris dataset
df = pd.read_csv("/Iris.csv")
df.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Let’s transform the data

Here I am going to subselect the four first columns as they contain the numerical values I am interested in.

df_subselected = df.iloc[:,1:4]
df_subselected.head()
sepal_width petal_length petal_width
0 3.5 1.4 0.2
1 3.0 1.4 0.2
2 3.2 1.3 0.2
3 3.1 1.5 0.2
4 3.6 1.4 0.2

Now I can simply apply the corr function:

df_subselected.corr()
sepal_width petal_length petal_width
sepal_width 1.000000 -0.420516 -0.356544
petal_length -0.420516 1.000000 0.962757
petal_width -0.356544 0.962757 1.000000

Finally I can also display the result in a heatmap format as well. It makes it easier to assess the strength of the correlations.

sns.heatmap(df_subselected.corr())

png

Et voila!

Share this post: