BigQuery allows to define nested and repeated fields
in a table. Although this is very powerful, it makes it much more complex to retrieve the
data if one is not used to such structures. Especially beginners tend to use an
on the nested fields, followed by a huge
GROUP BY statement on the not-originally-repeated fields. Imho, using
is oftentimes the better approach here.
SELECT id, (SELECT value from t.repeated_fields LIMIT 1) FROM table t
Caution: When using expression subqueries, you need to make sure that the result is a single value (scalar or array), otherwise you will get the error message
Scalar subquery produced more than one element
In the example code above this is ensured by enforcing one result via
Run on BigQuery
The most prominent use case is probably the BigQuery export schema of Google Analytics. To be honest, I also feel that the schema is not very friendly for newcomers with its ~30 RECORD-type (nested) fields and 300+ columns.
In a nutshell, each row represents one session.
A session consists of multiple hits. Those hits are also available in the nested and repeated
hits field. But wait, there is more...
Each hit can have a number of so called
customDimensions (meta data that can be attached to each hit). So the resulting table structue looks something
- field_1 - field_2 - hits - field_1 - field_2 - customDimensions - index - value
The following example uses the public Google Analytics sample dataset for BigQuery and shows a couple of sample expression subqueries
SELECT fullVisitorId, visitStartTime, TIMESTAMP_SECONDS(visitStartTime) as started_at, TIMESTAMP_SECONDS(visitStartTime + CAST( (SELECT time from t.hits ORDER BY hitNumber DESC LIMIT 1) /1000 AS INT64)) as ended_at, (SELECT COUNT(*) from t.hits) as hit_count, (SELECT page.hostname || page.pagePath from t.hits WHERE isEntrance = TRUE) as landing_page, ( SELECT (SELECT COUNT(*) from h.customDimensions) FROM t.hits h WHERE hitNumber = 1 ) as customDimension_count_of_first_hit, FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t ORDER BY visitStartTime asc
Wanna stay in touch?
Since you ended up on this blog, chances are pretty high that you're into Software Development (probably PHP, Laravel, Docker or Google Big Query) and I'm a big fan of feedback and networking.
So - if you'd like to stay in touch, feel free to shoot me an email with a couple of words about yourself and/or connect with me on LinkedIn or Twitter or simply subscribe to my RSS feed or go the crazy route and subscribe via mail and don't forget to leave a comment :)