It is very likely that sooner or later you will have to parse through a JSON file as opposed to a well organised CSV file. This post gather a couple of python code that can help you get started making sense of JSON format. Let’s quickly have a look.
Let’s import some packages
# Import
import json
from pandas.io.json import json_normalize
We create a dataset to play with
# Let's manually create some JSON string
json_string = """
{
"person": {
"name": "John",
"age": 31,
"city": "San Francisco",
"relatives": [
{
"name": "Jane",
"age": 34,
"city": "Los Angeles"
}
]
}
}
"""
data = json.loads(json_string)
data
{'person': {'age': 31,
'city': 'San Francisco',
'name': 'John',
'relatives': [{'age': 34, 'city': 'Los Angeles', 'name': 'Jane'}]}}
Let’s parse the data!
We can easily transform a dataset in a JSON format into a more readible format - like a DataFrame - using the json_normalize function. See below:
data_parsed = json_normalize(data)
data_parsed
person.age | person.city | person.name | person.relatives | |
---|---|---|---|---|
0 | 31 | San Francisco | John | [{'name': 'Jane', 'age': 34, 'city': 'Los Ange... |
json_normalize(data_parsed["person.relatives"][0])
age | city | name | |
---|---|---|---|
0 | 34 | Los Angeles | Jane |
Loading a JSON file into a jupyter notebook, normalizing the data and saving it into a DataFrame can be done as follows:
filepath = "pathtojonfile/file.json"
dataJSON = json.load(open(filepath))
dataDF = json_normalize(dataJSON)