Wide to Long Format in Python


In this post I will try to explain how we can transition from a DataFrame with a wide format to a DataFrame with a long format. Let’s say we have a dataset with three columns (the definitions does not really matter here but it just helps better understand what we are trying to do): one column containing a list of marketing channels, a second containing the number of converted sessions and finally the average sessions duration related to these channels.

Now what we want to do is group the two columns containing the average duration and the number of converted sessions together under one column called Value. And finally have another column called Metric which will help us differentiate between the two measures like so:

Wide to Long Format

So let’s have a look at how we can do it.

Importing some packages

# Import
import pandas as pd
import numpy as np

We create a dataset with a wide format

# Create some data
df = pd.DataFrame({"Channel": ["Social","Paid Search","Email"],
                  "Avg Duration": [2.34,2.54,1.56],
                  "Conversions": [10, 13, 8]})
Avg Duration Channel Conversions
0 2.34 Social 10
1 2.54 Paid Search 13
2 1.56 Email 8

Let’s transform it into a long format

We will be using the melt() function which is available in the pandas library. We pass on two parameters. The first one being id_vars - which is the column(s) to use as an identifier; in our case Channel. The second parameter is value_vars - column(s) to unpivot; in our case, these are the two columns that include the metrics we want to group together: Avg Duration and Conversions.

# Reshape the data using pandas
df2 = pd.melt(df, id_vars = ['Channel'], value_vars = ['Avg Duration', 'Conversions'])
Channel variable value
0 Social Avg Duration 2.34
1 Paid Search Avg Duration 2.54
2 Email Avg Duration 1.56
3 Social Conversions 10.00
4 Paid Search Conversions 13.00
5 Email Conversions 8.00

Et voila!

Share this post: