How to save JSON strings in BigQuery and aggregate the values in the array

Asked 1 years ago, Updated 1 years ago, 123 views

Looking at the following entry in Qiita, I am trying various things because I think that if I can do this, I will be able to analyze the log better.
http://qiita.com/hakobera/items/5e280ad33d72f82de39c

JSON_EXTRACT, JSON_EXTRACT_SCALAR can retrieve values, so I thought I could meet the requirements by creating a table for aggregation, but there was an array in the saved JSON data.

For example, if log data looks like the following is in the column logdata in the BigQuery table,

{
  "date": "2015-08-31",
  "time": "00:00:00",
  "type": "RESPONSE",
  "userno"—12345,
  "friends": [{"userno":1, "nickname": "hogehoge"}, {"userno":1, "nickname": "fugafuga"}]
}

The worst is

select 
  JSON_EXTRACT_SCALAR(logdata, '$.friends[0].userno')friend0,
  JSON_EXTRACT_SCALAR(logdata, '$.friends[1].userno')friend1
from serverlog_20150831

It may be better to arrange as many arrays as shown in
Like FLATTEN, I'm looking for a good way to extract a few minutes of an array into a different record.

Please let me know if anyone has a good idea.

google-bigquery

2022-09-30 20:57

3 Answers

He gave it persevered with < p > < / >!

SELECT
  JSON_EXTRACT(logdata, '$.userno') AS userno,
  "{"+ REGEXP_REPLACE(SPLIT(JSON_EXTRACT(json, '$.friends'), "}, {"), "^\\[\\{|\\}\\]$", ")+"}"AS friend
from
  serverlog_20150831

The results of this are as follows:

Row userno friend   
112345 {"userno":1, "nickname": "hogehoge"}   
212345 {"userno":1, "nickname": "fugafuga"}

Then, let's get rid of json from here.

SELECT
  JSON_EXTRACT(logdata, '$.userno') AS userno,
  JSON_EXTRACT("{"+REGEXP_REPLACE(SPLIT(JSON_EXTRACT(logdata, '$.friends'), "}, {"), "^\\[\\{|\\}\\]$",")+"}", '$.userno') AS friend.userno,
  SUBSTR(JSON_EXTRACT("{"+REGEXP_REPLACE(SPLIT(JSON_EXTRACT(logdata, '$.friends'), "}, {"), "^\\[\\{|\\}$", "}", "}", "$.nickname'"), 2, LENGTH(JSON_EXTRACT({"+REGREGEX_REGAPLETRACT($LETRACT(JEX_REGETRACTRACT(JEX_EX_REGETRACT(JPLECT($)}
from
  serverlog_20150831

Here's how it goes:

Row userno friend_userno friend_nickname  
1123451 hogehoge     
2123451 fugafuga


2022-09-30 20:57

Created StandardSQL version.
It's very popular for Debugging, but what I do is basically the same.

#standardSQL
SELECT
  userno,
  JSON_EXTRACT_SCALAR(friend, "$.userno") AS friend_userno,
  JSON_EXTRACT_SCALAR(friend, "$.nickname") AS friend_nickname
FROM(
  SELECT
    userno,
    CONCAT("{",REGEXP_REPLACE(friend, "^\\[\\{|\\}\\]$", "", "}") AS friend
  FROM(
    SELECT
      userno,
      friend
    FROM(
      SELECT
        userno,
        SPLIT(friends, "}, {") AS friends
      FROM(
        SELECT
          JSON_EXTRACT('{"userno":12345, "friends":[{"userno":1, "nickname":"hoge"}, {"userno":1, "nickname":"fugafuga"}}}', "$.userno") ASuserno,
          JSON_EXTRACT ('{"userno":12345, "friends":[{"userno":1, "nickname":"hogehoge"}, {"userno":1, "nickname":"fugafuga""}}', "$.friends") AS friends))
    CROSS JOIN
      UNNEST(friends)AS friend))


2022-09-30 20:57

How about JSON.parse in UDF?

# standardSQL
# The STRUCT type matches JSON.
CREATE TEMPORARY FUNCTION
  parse_logdata(logdata STRING)
  RETURNS STRUCT > >
  US>"LANGUAGE js AS""
  return JSON.parse(logdata);
""";
# Assumptions Logged in LOGDATA_TABLE
WITH
  LOGDATA_TABLE AS(
  SELECT
    '{  "date": "2015-08-31", "time": "00:00:00", "type": "RESPONSE", "userno": 12345, "friends": [{"userno": 1, "nickname": "hogehoge"}, {"userno": 1, "nickname": "fugafuga"}]} 'AS logata
  UNION ALL
  SELECT
    '{  "date": "2015-08-31", "time": "00:00:00", "type": "RESPONSE", "userno": 12346, "friends": [{"userno": 1, "nickname": "foobar": 1, "nickname": 1, "piyopiyo"}}')
SELECT
  logdata.userno AS userno,
  friends.userno AS friend_no,
  nickname
FROM(
  SELECT
    parse_logdata(logdata).*
  from
    LOGDATA_TABLE)AS logdata
INNER JOIN
  UNNEST (logdata.friends) AS friends

Note: You may need to be aware of the concurrent number of queries with UDF.
https://cloud.google.com/bigquery/quotas?hl=en
https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions?hl=en#limits
Also, it's normal that you can't use it with VIEW and BI tools.


2022-09-30 20:57

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.