Dealing with JSON Files

python

It is very likely that sooner or later you will have to parse through a JSON file as opposed to a well organised CSV file. This post gather a couple of python code that can help you get started making sense of JSON format. Let’s quickly have a look.

Let’s import some packages

# Import
import json
from pandas.io.json import json_normalize

We create a dataset to play with

# Let's manually create some JSON string
json_string = """
{
    "person": {
        "name": "John",
        "age": 31,
        "city": "San Francisco",
        "relatives": [
            {
                "name": "Jane",
                "age": 34,
                "city": "Los Angeles"
            }
        ]
    }
}
"""
data = json.loads(json_string)
data
{'person': {'age': 31,
  'city': 'San Francisco',
  'name': 'John',
  'relatives': [{'age': 34, 'city': 'Los Angeles', 'name': 'Jane'}]}}

Let’s parse the data!

We can easily transform a dataset in a JSON format into a more readible format - like a DataFrame - using the json_normalize function. See below:

data_parsed = json_normalize(data)
data_parsed
person.age person.city person.name person.relatives
0 31 San Francisco John [{'name': 'Jane', 'age': 34, 'city': 'Los Ange...
json_normalize(data_parsed["person.relatives"][0])
age city name
0 34 Los Angeles Jane

Loading a JSON file into a jupyter notebook, normalizing the data and saving it into a DataFrame can be done as follows:

filepath = "pathtojonfile/file.json"
dataJSON = json.load(open(filepath))
dataDF = json_normalize(dataJSON)

Et voila!

Share this post: