This one is pretty straight forward. Let’s say you have a column in a DataFrame that contains String data whose format is all over the place: mix of lowercase, uppercase and camelcase. Here is how you can go through all the values in that column and change them all to lowercase for example, thus keeping it consistent. Let’s have a look!
Let’s import some packages
# Import
import pandas as pd
import numpy as np
We create a dataset to play with
# Create some data
df = pd.DataFrame({"String Data": ["all lower","ALL UPPER","mix of lower AND UPPER"]})
df
String Data | |
---|---|
0 | all lower |
1 | ALL UPPER |
2 | mix of lower AND UPPER |
Let’s transform the data
Here we are using the apply function in association with a lambda function to go over every single element of the String Data column and apply lower(). We are just adding a new column to the original DataFrame to demonstrate the result against the original data.
df["String Data Formatted"] = df["String Data"].apply(lambda x: x.lower())
df
String Data | String Data Formatted | |
---|---|---|
0 | all lower | all lower |
1 | ALL UPPER | all upper |
2 | mix of lower AND UPPER | mix of lower and upper |
Just in case you were wondering, amending the above function to format the values in the String Data column to uppercase is just as simple as:
df["String Data Formatted"] = df["String Data"].apply(lambda x: x.upper())
df
String Data | String Data Formatted | |
---|---|---|
0 | all lower | ALL LOWER |
1 | ALL UPPER | ALL UPPER |
2 | mix of lower AND UPPER | MIX OF LOWER AND UPPER |