zhaopinxinle.com

Efficiently Flattening JSON Data in BigQuery: A Comprehensive Guide

Written on

Chapter 1: Introduction to JSON Flattening

Flattening JSON can often be a daunting task. Navigating through JSON structures to identify keys tends to be quite time-consuming. However, I have discovered a method that simplifies this process. In this guide, I will explain the concept from the ground up while keeping the information concise.

Basics of JSON Structure

JSON comprises two primary data types: Struct and Array. Flattening a struct is relatively easy, but arrays require un-nesting. In BigQuery, we can utilize the UNNEST function to flatten array data. A critical step is to identify all arrays within a JSON structure.

Under the INFORMATION_SCHEMA, there exists a system view known as COLUMN_FIELD_PATHS, which contains all keys of structs and arrays across all tables in a dataset. Below, I will outline the approach I used to leverage COLUMN_FIELD_PATHS.

Step 1: Create a Sample JSON File

Begin by taking a sample of your JSON object and saving it as a file. You can find a sample file [here](#).

Step 2: Upload JSON to BigQuery

Next, upload your file to BigQuery using the manual upload method. Ensure you select the appropriate options during the upload process.

Uploading JSON data to BigQuery Confirmation of successful JSON upload

Step 3: Access Your JSON in BigQuery

Once your JSON is uploaded, it will be available in a BigQuery table using BigQuery’s native data types (struct and array).

Step 4: Query Keys in COLUMN_FIELD_PATHS

Now you can query all your keys in COLUMN_FIELD_PATHS using the following SQL command:

SELECT *

FROM INFORMATION_SCHEMA.COLUMN_FIELD_PATHS

WHERE table_name = 'your-table-name';

Typically, we need a hierarchy of all arrays within a JSON. For this, you can use the query below:

WITH cte AS (

SELECT

table_name,

column_name,

field_path,

data_type,

CAST((LENGTH(data_type) - LENGTH(REGEXP_REPLACE(data_type, 'ARRAY<', ''))) / LENGTH('ARRAY<') AS INT64) AS level_inverse

FROM DATASET-NAME.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS

WHERE table_name = 'YOUR-TABLE-NAME'

AND data_type NOT LIKE 'STRUCT%'

AND data_type LIKE 'ARRAY<%'

),

cte2 AS (

SELECT * EXCEPT(level_inverse),

DENSE_RANK() OVER (PARTITION BY table_name ORDER BY level_inverse DESC) - 1 AS levels

FROM cte

)

SELECT *

FROM cte2;

Example output of JSON keys in BigQuery

Although this method may seem straightforward, it has significantly streamlined my work process, so I wanted to share it with others.

Cheers!

#bigquery — #json — #flatten — #array — #struct — #finding_keys_of_json_in_bigquery

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Understanding Color: The Intersection of Perception and Reality

Exploring the philosophical and scientific dimensions of color perception and its subjective nature.

Embracing Emptiness: A Journey Toward Resilience and Growth

Explore how embracing emptiness can lead to resilience and creativity, challenging societal fears and misconceptions.

Effective Techniques to Alleviate Headaches While Traveling

Discover simple breathing and mobility practices to relieve headaches without medication.

# Unlocking Your Car's Hidden Self-Driving Capabilities

Discover how to utilize existing features in your car for self-driving capabilities that rival high-end models like Tesla.

Challenging the Belief in a 6,000-Year-Old Earth

An exploration of the evidence disproving the Young Earth Creationist belief that the Earth is only 6,000 years old.

Unlocking Lucrative Opportunities for Writers on Unique Platforms

Discover lesser-known websites that offer higher pay for freelance writers than traditional platforms.

Recognizing the Right Time for a Gym Break: A Comprehensive Guide

Understand the signs that indicate when to take a break from the gym to enhance your fitness journey.

Mindful Meditation: Overcoming the Challenge of 10 Minutes of Stillness

Discover how to integrate mindfulness into daily life and overcome the challenges of meditation in just 10 minutes.