I want to create DF using json_normalize, but I get a key error.

Asked 2 years ago, Updated 2 years ago, 373 views

"When I try to read the JSON file into DF as shown below, I get ""keyerror""."
Why?

The original JSON file is as follows:

[
    {
        "creat_at": "2020-04-26T 12:55:58 + 0900",
        "pay_id": "E86F0CD0B346",
        "pay": {
            "a"—1.32,
            "b": [
                0,
                0
            ],
            "c": "xxxx",
            "d": "0709",
            "e": "sssss",
            "f": 290,
            "g" : -55,
            "h"—23.3
        },
        "timestamp": "2020-03-16T09:18:39.878Z",
        "updated_at": "2020-04-26T 12:55:58+0900"
    },
    {
        "creat_at": "2020-04-26T 12:55:58 + 0900",
        "pay_id": "E86F0CD0B346",
        "pay": {
            "a"—1.32,
            "b": [
                0,
                0
            ],
            "c": "xxxx",
            "d": "0809",
            "e": "sssss",
            "f": 290,
            "g" : -55,
            "h"—23.3
        },
        "timestamp": "2020-03-16T09:18:39.878Z",
        "updated_at": "2020-04-26T 12:55:58+0900"
    },
・・・
・・・
json_file="./export.json"
df0 = pd.read_json(json_file)
df0["pay".iloc[1]

Results

{'a':1.4,
 'b': [29,0],
 'c': 'xxxx',
 US>'d': '00070',
 'e': 'sssss',
 US>'f': 236,
 'g': -95,
 US>'h':21.7}

We are trying to make the above data DF by processing the following.

from pandas.io.json import json_normalize
df_items=json_normalize(df0.to_dict("records"), "pay", "pay_id")
df_items.sort_values("item_id")

The results I would like to get from the above results are as follows, but key errors occur.Why?

|item_id|a|b|c|d|e|f|g|h|
|---    |---|--- |----|--- |--- |---|---|--- |
| 1 | 1.4 | 29, 0 | xxxx | 0070 | ssss | 236 | -95 | 21.7 |

python python3 pandas

2022-09-30 21:49

1 Answers

There is no information about the key item_id in the program or data provided, and there is no process to configure it later.

For example, the DataFrame column name ['0', 'pay_id'] that can be done by json_normalize() processing of the question article is ['0', 'pay_id'], where item_id does not exist, so df_items.sort_values("/code>

Why don't you check print() for each step of the processing to see what kind of data is generated by each processing?print(df0.to_dict("records") or print(df_items)?

To be as simple as possible:
item_id uses the index number of the list as it is.

import pandas as pd
from pandas import json_normalize

json_file="./export.json"
df0 = pd.read_json(json_file)
paylist=df0["pay"]#'pay' Create a list of dictionary data

df_items=json_normalize(paylist)#'pay' list of dictionary data normalize
df_items.index.name = 'item_id' # Set index name to 'item_id' for DataFrame
df_items.sort_values('item_id')

print(df_items)


2022-09-30 21:49

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.