Long to Wide Format in Python


In this post I will explain how we can transform a DataFrame with a long format into a DataFrame with a wide format. As an example let’s say we have a dataset with three columns: one column containing a list of marketing channels, a second one containing a list of metrics for each channel (either the average duration of the sessions or the number of converted sessions belonging to the respective channels) and finally a third one reporting the values for each metric.

Now what we want to do is “ungroup” the metrics contained in columns 2 (metric definition) and 3 (values). In other words, we want to separate the average duration and the number of converted sessions metrics and create a column for each. This is summarised as follows:

Wide to Long Format

So let’s see how we do it!

First thing first: importing packages

# Import
import pandas as pd
import numpy as np

We create the dataset in a long format

# Create some data
df = pd.DataFrame({"Channel": ["Social","Social","Paid Search","Paid Search","Email","Email"],
                  "Metric": ["Conversions","Avg Duration","Conversions","Avg Duration","Conversions","Avg Duration"],
                  "Values": [10, 2.34, 13, 2.54, 8, 1.56]})
Channel Metric Values
0 Social Conversions 10.00
1 Social Avg Duration 2.34
2 Paid Search Conversions 13.00
3 Paid Search Avg Duration 2.54
4 Email Conversions 8.00
5 Email Avg Duration 1.56

Let’s transform it into a wide format

We will be using the pivot() function which is available in the pandas library. We will pass three parameters to the function: index - which is the column to use to create the new index: in our case Channel; columns - for the new DataFrame’s columns and finally values - that will populate the values for the new DataFrame.

# Reshape the data using pandas
df2=df.pivot(index='Channel', columns='Metric', values='Values')
Metric Avg Duration Conversions
Email 1.56 8.0
Paid Search 2.54 13.0
Social 2.34 10.0
# Reshape the data using pandas and format the DataFrame properly
df3=df.pivot(index='Channel', columns='Metric', values='Values').reset_index()
df3.columns.name = ""
Channel Avg Duration Conversions
0 Email 1.56 8.0
1 Paid Search 2.54 13.0
2 Social 2.34 10.0

Et voila!

Share this post: