How to monitor query costs in Google BigQuery

Posted by Pascal Landau on 2020-06-21 12:00:00

Cost monitoring in Google BigQuery can be a difficult task, especially within a growing organization and lots of (independent) stakeholders that have access to the data. If your organization is not using reserved slots (flat-rate pricing) but is billed by the number of bytes processed (on-demand pricing), costs can get quickly out of hand, and we need the means to investigate or "debug" the BigQuery usage in order to understand:

  • who ran queries with a high cost
  • what were the exact queries
  • when did those queries run (and are they maybe even running regularly)

Previously, we had to manually set up query logging via Stackdriver as explained in the article Taking a practical approach to BigQuery cost monitoring but in late 2019 BigQuery introduced INFORMATION_SCHEMA views as a beta feature that also contain data about BigQuery jobs via the INFORMATION_SCHEMA.JOBS_BY_* views and became generally available (GA) at 2020-06-16

Examples

SELECT 
  creation_time,
  job_id,
  project_id,
  user_email,
  total_bytes_processed,
  query
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER

SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT

SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION

Working Example

  • this query will select the most interesting fields in terms of cost monitoring from the INFORMATION_SCHEMA.JOBS_BY_USER view for all jobs that have been run in region US in the currently selected project
  • the cost_in_dollar is estimate by calculating the total_bytes_processed in Terabyte and multiplying the result with $5.00 (which corresponds to the the cost as of today 2020-06-21). Also, we only take those costs into account if the query was not answered from the cache (see the cache_hit != true condition)
  • the creation_time is converted to our local timezone
  • the results are restricted to the past 30 days by using the WHERE clause to filter on the partition column creation_time
  • feel free to replace JOBS_BY_PROJECT with JOBS_BY_USER or JOBS_BY_ORGANIZATION

Run on BigQuery

Open in BigQuery UI

BigQuery UI: Monitor query costs in BigQuery example

Notes

While playing around with the INFORMATION_SCHEMA views I've hit a couple of gotchas:

  • the different views require different permissions
  • the views are regionalized, i.e. we must prefix the region (see region-us in the view specification) and must run the job in that region (e.g. from the BigQuery UI via More > Query Settings > Processing location)
  • it is not possible to mix multiple regions in the query, because a query with processing location US can only access resources in location US. Though it would be very helpful for organizations that actively use different locations, something like this is not possible:
    SELECT * FROM 
    (SELECT * `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION)
    UNION ALL
    (SELECT * `region-eu`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION)
  • data is currently only kept for the past 180 days
  • the JOBS_BY_USER view seems to "match" the user based on the email address. My user email adress is a @googlemail.com address; in the user column it is stored as @gmail.com. Thus, I get no results when using JOBS_BY_USER
  • JOBS_BY_USER and JOBS_BY_PROJECT will use the currently selected project by default. A different project (e.g. other-project) can be specified via
    SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  • the full query is not available for JOBS_BY_ORGANIZATION

Use Cases

I use this approach in our organization to set up a view based on INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION that is then used as a data source for Google DataStudio. This allows me to get a quick high level overview over all query costs and further enables me to drill down deeper if I need to. I can even find the exact queries via their job_id.

BigQuery cost monitoring dashboard


Wanna stay in touch?

Since you ended up on this blog, chances are pretty high that you're into Software Development (probably PHP, Laravel, Docker or Google Big Query) and I'm a big fan of feedback and networking.

So - if you'd like to stay in touch, feel free to shoot me an email with a couple of words about yourself and/or connect with me on LinkedIn or Twitter or simply subscribe to my RSS feed or go the crazy route and subscribe via mail and don't forget to leave a comment :)

Subscribe to posts via mail

We use Mailchimp as our newsletter provider. By clicking subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.
Waving bear

Comments