Querying Data

The document describes how to run synchronous and asynchronous queries from the BigQuery API.

Overview
Additional limits
Synchronous queries
Asynchronous queries
Interactive and batch queries
Query caching
Returning large query results
Table decorators

Overview

Queries are written in BigQuery's SQL dialect. BigQuery supports both synchronous and asynchronous query methods. Both methods are handled by a job, but the "synchronous" method exposes a timeout value that waits until the job has finished before returning.

All query results are saved to a table, which can be either persistent or temporary:

A persistent table can be a new or existing table in any dataset in which you have WRITE privileges
A temporary table is a randomly named table saved in a special dataset; The table has a lifetime of approximately 24 hours. Temporary tables are not available for sharing, and are not visible using any of the standard list or other table manipulation methods.

The following table summarizes the possible combinations of query type and results table lifetime:

Results Table Lifetime	Query Type
	Synchronous	Asynchronous
Permanent Table	Not available in a single call. You can call jobs.insert and specify a results table, then call jobs.getQueryResults on that job, which will wait for the query to finish, then return results. details...	Create a query job and specify the table into which store the results. This can be a new table, or an existing table that you can either overwrite or append results to. details... To run the query - Call jobs.insert(query_type_job) with a results table specified. Method returns a job ID. To view the results - Poll the job for DONE status; when done, call tabledata.list to get table data, or run additional queries against the new table.
Temporary Table	Data stored in a temporary table available only to the current user. Table lifetime is about 24 hours. details... To run the query -Call jobs.query. To view the results - If the query finishes within the specified timeout, the first page of results is returned by job.query. For additional pages, or if the query has expired, or to examine the results again, call jobs.getQueryResults.	Create a query job but do not specify an output table. Results are stored in a temporary table with a lifetime of about 24 hours. details... To run the query - Call jobs.insert(query_type_job) without specifying a results table, and store the returned job ID. To view the results - Call jobs.getQueryResults with the job ID.

Results Table Lifetime

Query Type

Synchronous

Asynchronous

Permanent Table

Not available in a single call. You can call jobs.insert and specify a results table, then call jobs.getQueryResults on that job, which will wait for the query to finish, then return results. details...

Create a query job and specify the table into which store the results. This can be a new table, or an existing table that you can either overwrite or append results to. details...

To run the query - Call jobs.insert(query_type_job) with a results table specified. Method returns a job ID.
To view the results - Poll the job for DONE status; when done, call tabledata.list to get table data, or run additional queries against the new table.

Temporary Table

Data stored in a temporary table available only to the current user. Table lifetime is about 24 hours. details...

To run the query -Call jobs.query.
To view the results - If the query finishes within the specified timeout, the first page of results is returned by job.query. For additional pages, or if the query has expired, or to examine the results again, call jobs.getQueryResults.

Create a query job but do not specify an output table. Results are stored in a temporary table with a lifetime of about 24 hours. details...

To run the query - Call jobs.insert(query_type_job) without specifying a results table, and store the returned job ID.
To view the results - Call jobs.getQueryResults with the job ID.

Additional limits

The following additional limits apply for querying data.

Maximum tables per query: 1,000
Maximum query length: 256 KB

Synchronous queries

All synchronous query results are saved to an anonymous table with a lifetime of about 24 hours. If you want to save the data for longer than that, re-run the query asynchronously and specify a permanent destination table. Synchronous queries are handled by a job, as all queries are, and will appear in your jobs history.

Call jobs.query with your query string. This method takes an optional timeout period; if set to 0 or not specified, the method will default to 10 seconds.
- If the query returns within the specified timeout period, the method will return the first page of results. For additional results, call jobs.getQueryResults as described next.
- If the query timeout expires before the query has finished, the method will return jobComplete=false,, and you must call jobs.getQueryResults as described next. The query job will continue to run even after the timeout period until it finishes, either successfully or because an error has occurred.
[If necessary] Call jobs.getQueryResults to page through additional results, to get results from a query that exceeded its timeout, or to examine any query results until the temporary table is deleted. This method lets you specify a start row, and also takes a timeout that behaves the same as the jobs.query timeout to allow waiting if the job is not yet complete.

Synchronous Query Examples

Java

This sample uses the Google APIs Client Library for Java.

  /**
   * Runs a synchronous BigQuery query and displays the result.
   *
   * @param bigquery An authorized BigQuery client
   * @param projectId The current project id
   * @param query A String containing a BigQuery SQL statement
   * @param out A PrintStream for output, normally System.out
   */
  static void runQueryRpcAndPrint(
      Bigquery bigquery, String projectId, String query, PrintStream out) throws IOException {
    QueryRequest queryRequest = new QueryRequest().setQuery(query);
    QueryResponse queryResponse = bigquery.jobs().query(projectId, queryRequest).execute();
    if (queryResponse.getJobComplete()) {
      printRows(queryResponse.getRows(), out);
      if (null == queryResponse.getPageToken()) {
        return;
      }
    }
    // This loop polls until results are present, then loops over result pages.
    String pageToken = null;
    while (true) {
      GetQueryResultsResponse queryResults = bigquery.jobs()
          .getQueryResults(projectId, queryResponse.getJobReference().getJobId())
          .setPageToken(pageToken).execute();
      if (queryResults.getJobComplete()) {
        printRows(queryResults.getRows(), out);
        pageToken = queryResults.getPageToken();
        if (null == pageToken) {
          return;
        }
      }
    }
  }

  private static void printRows(List<TableRow> rows, PrintStream out) {
    if (rows != null) {
      for (TableRow row : rows) {
        for (TableCell cell : row.getF()) {
          // Data.isNull() is the recommended way to check for the 'null object' in TableCell.
          out.printf("%s, ", Data.isNull(cell.getV()) ? "null" : cell.getV());
        }
        out.println();
      }
    }
  }

Python

This sample uses the Google APIs Client Library for Python.

# Run a synchronous query, save the results to a table, overwriting the
# existing data, and print the first page of results.
# Default timeout is to wait until query finishes.
def runSyncQuery (service, projectId, datasetId, timeout=0):
  try:
    print 'timeout:%d' % timeout
    jobCollection = service.jobs()
    queryData = {'query':'SELECT word,count(word) AS count FROM publicdata:samples.shakespeare GROUP BY word;',
                 'timeoutMs':timeout}

    queryReply = jobCollection.query(projectId=projectId,
                                     body=queryData).execute()

    jobReference=queryReply['jobReference']

    # Timeout exceeded: keep polling until the job is complete.
    while(not queryReply['jobComplete']):
      print 'Job not yet complete...'
      queryReply = jobCollection.getQueryResults(
                          projectId=jobReference['projectId'],
                          jobId=jobReference['jobId'],
                          timeoutMs=timeout).execute()

    # If the result has rows, print the rows in the reply.
    if('rows' in queryReply):
      print 'has a rows attribute'
      printTableData(queryReply, 0)
      currentRow = len(queryReply['rows'])

      # Loop through each page of data
      while('rows' in queryReply and currentRow < queryReply['totalRows']):
        queryReply = jobCollection.getQueryResults(
                          projectId=jobReference['projectId'],
                          jobId=jobReference['jobId'],
                          startIndex=currentRow).execute()
        if('rows' in queryReply):
          printTableData(queryReply, currentRow)
          currentRow += len(queryReply['rows'])

  except AccessTokenRefreshError:
    print ("The credentials have been revoked or expired, please re-run"
    "the application to re-authorize")

  except HttpError as err:
    print 'Error in runSyncQuery:', pprint.pprint(err.content)

  except Exception as err:
    print 'Undefined error' % err

Asynchronous queries

Asynchronous queries are run by calling bigquery.jobs.insert and including the configuration.query property in the request. The method returns immediately with a job ID, and you must either request the job periodically and check status, or call jobs.getQueryResults, which will return when the job is complete.

Asynchronous queries are always saved to a table: this can be a new table, an existing table, or a temporary table. You can choose whether to append or overwrite data in an existing table, and whether to create a new table if none exists by that name.

Create a query job by calling bigquery.jobs.insert. If saving results to a permanent table, populate the destinationTable property with target table information; if saving the results to a temporary table, omit that property. The method returns immediately. Save the jobId returned in the response
Check job status by calling bigquery.jobs.get with the jobId by the insert request. Check the status.state value to see if the job has completed successfully; if not, examine status.errorResult to learn why the job failed.
Once the job reports a successful result, browse through the table data of the results table that you specified. If you got an error message, examine the error and rerun as appropriate.

Tip: An alternative to steps 2 and 3, polling job status and calling tabledata.list, is to call jobs.getQueryResults(jobId) with no timeout, which will wait until the job is done and then return the query results at any row index that you like.

Asynchronous Query Examples

Java

This sample uses the Google APIs Client Library for Java.

  static void runQueryJobAndPrint(
      Bigquery bigquery, String projectId, String query, PrintStream out)
      throws IOException, InterruptedException {
    Job queuedJob = startQuery(bigquery, projectId, query, createJobId(query));
    pollUntilQueryDone(bigquery, queuedJob.getJobReference(), 1);
    printAllRows(bigquery, queuedJob, out);
  }

  private static void printAllRows(Bigquery bigquery, Job queuedJob, PrintStream out)
      throws IOException {
    String pageToken = null;
    while (true) {
      // queuedJob.getConfiguration().getQuery().getDestinationTable() points to a temporary table
      // automatically created to hold results.
      TableDataList tableDataList = bigquery.tabledata().list(
          queuedJob.getConfiguration().getQuery().getDestinationTable().getProjectId(),
          queuedJob.getConfiguration().getQuery().getDestinationTable().getDatasetId(),
          queuedJob.getConfiguration().getQuery().getDestinationTable().getTableId())
          .setPageToken(pageToken).execute();
      printRows(tableDataList.getRows(), out);
      pageToken = tableDataList.getPageToken();
      if (null == pageToken) {
        break;
      }
    }
  }

  /**
   * Creates and starts a job that queries a table.
   *
   * @param bigquery an authorized BigQuery client
   * @param projectId a String containing the current Project ID
   * @param query the actual query string
   * @param jobId unique identifier for the job being inserted
   */
  private static Job startQuery(Bigquery bigquery, String projectId, String query, String jobId)
      throws IOException {
    System.out.format("\nInserting Query Job: %s\n", query);
    JobConfigurationQuery queryConfig = new JobConfigurationQuery().setQuery(query);
    JobConfiguration config = new JobConfiguration().setQuery(queryConfig);
    Job job = new Job().setId(jobId).setConfiguration(config);
    Job queuedJob = bigquery.jobs().insert(projectId, job).execute();
    System.out.format("\nJob ID of Query Job is: %s\n", jobId);
    return queuedJob;
  }

  /**
   * Polls BigQuery until the job's status is "DONE".
   *
   * @param bigquery an authorized BigQuery client
   * @param jobReference a JobReference, containing the job and project id
   * @param pauseSeconds seconds to wait between poll attempts
   */
  private static void pollUntilQueryDone(
      Bigquery bigquery, JobReference jobReference, int pauseSeconds)
      throws IOException, InterruptedException {
    // Variables that track the total query time
    long startTime = System.currentTimeMillis();
    long elapsedTime;

    while (true) {
      Job pollJob =
          bigquery.jobs().get(jobReference.getProjectId(), jobReference.getJobId()).execute();
      elapsedTime = System.currentTimeMillis() - startTime;
      System.out.format("Job status (%dms) %s: %s\n", elapsedTime, jobReference.getJobId(),
          pollJob.getStatus().getState());
      if (pollJob.getStatus().getState().equals("DONE")) {
        System.out.format(
            "Job done, processed %s bytes.\n", pollJob.getStatistics().getTotalBytesProcessed());
        return;
      }
      // Pause execution for pauseSeconds before polling job status again, to
      // reduce unnecessary calls to the BigQuery API and lower overall
      // application bandwidth.
      Thread.sleep(pauseSeconds * 1000);
    }
  }

Python

This sample uses the Google APIs Client Library for Python.


def runAsyncQuery (service, projectId):
  try:
    jobCollection = service.jobs()
    queryString = 'SELECT corpus FROM publicdata:samples.shakespeare GROUP BY corpus;'
    jobData = {
      'configuration': {
        'query': {
          'query': queryString,
        }
      }
    }

    insertResponse = jobCollection.insert(projectId=projectId,
                                         body=jobData).execute()

    # Get query results. Results will be available for about 24 hours.
    currentRow = 0
    queryReply = jobCollection.getQueryResults(
                      projectId=projectId,
                      jobId=insertResponse['jobReference']['jobId'],
                      startIndex=currentRow).execute()

    while(('rows' in queryReply) and currentRow < queryReply['totalRows']):
      printTableData(queryReply, currentRow)
      currentRow += len(queryReply['rows'])
      queryReply = jobCollection.getQueryResults(
                        projectId=projectId,
                        jobId=queryReply['jobReference']['jobId'],
                        startIndex=currentRow).execute()

  except HttpError as err:
    print 'Error in runAsyncTempTable:', pprint.pprint(err.resp)

  except Exception as err:
    print 'Undefined error' % err

Interactive and batch queries

By default, BigQuery runs interactive queries, meaning that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily rate limit but, alternatively, you can also choose to run your queries as batch queries. Batch queries are executed at a lower priority than interactive queries, taking from 30 minutes to 3 hours after the query has been issued before it is executed. However, batch queries do not count towards your concurrent rate limit and are charged at a lower rate than standard queries. If you have queries that aren't time sensitive, e.g. if you regularly run a set of standard queries to calculate daily aggregates, you can use batch querying to save money and quota. This can also be ideal if you tend to run queries overnight, so you do not mind the possible 3 hour delay in running these queries.

To use batch querying in your applications, include the priority field in your code. For example, use the priority field for asynchronous queries to batch them like so:

Java

This sample uses the Google APIs Client Library for Java.

  static void runBatchedQueryAndPrint(
      Bigquery bigquery, String projectId, String query, PrintStream out)
      throws IOException, InterruptedException {
    Job queuedJob = startBatchQuery(bigquery, projectId, query, createJobId(query));
    // Thread.sleep(30 * 60 * 1000); // Wait 30 minutes before pollling for a batched job.
    pollUntilQueryDone(bigquery, queuedJob.getJobReference(), 300);
    printAllRows(bigquery, queuedJob, out);
  }

  /**
   * A job id is an unique identifier for jobs submitted to BigQuery. It can be automatically
   * created, but it is a recommended practice to set them manually for posterior tracking, and
   * preventing duplicate jobs.
   */
  private static String createJobId(String query) {
    return String.format("job_%d_%d", System.currentTimeMillis(), query.hashCode());
  }

  /**
   * Creates and starts a job that queries a table, with BATCH priority.
   *
   * @param bigquery an authorized BigQuery client
   * @param projectId a String containing the current Project ID
   * @param query the actual query string
   * @param jobId unique identifier for the job being inserted
   */
  private static Job startBatchQuery(
      Bigquery bigquery, String projectId, String query, String jobId) throws IOException {
    System.out.format("\nInserting Batched Query Job: %s\n", query);
    JobConfigurationQuery queryConfig = new JobConfigurationQuery().setQuery(query);
    queryConfig.setPriority("BATCH"); // Set query priority to batch.
    JobConfiguration config = new JobConfiguration().setQuery(queryConfig);
    Job job = new Job().setId(jobId).setConfiguration(config);
    Job queuedJob = bigquery.jobs().insert(projectId, job).execute();
    System.out.format("\nJob ID of Query Job is: %s\n", jobId);
    return queuedJob;
  }

Python

This sample uses the Google APIs Client Library for Python.

def runAsyncQueryBatch(service, projectId):
  try:
    jobCollection = service.jobs()
    queryString = 'SELECT corpus FROM publicdata:samples.shakespeare GROUP BY corpus;'
    jobData = {
      'configuration': {
        'query': {
          'query': queryString,
          'priority': 'BATCH' # Set priority to BATCH
        }
      }
    }

    insertResponse = jobCollection.insert(projectId=projectId,
                                         body=jobData).execute()

    import time
    while True:
      status = jobCollection.get(projectId=projectId, jobId=insertResponse['jobReference']['jobId']).execute()
      currentStatus = status['status']['state']

      if 'DONE' == currentStatus:
        currentRow = 0
        queryReply = jobCollection.getQueryResults(
                       projectId=projectId,
                       jobId=insertResponse['jobReference']['jobId'],
                       startIndex=currentRow).execute()

        while(('rows' in queryReply) and currentRow < queryReply['totalRows']):
          printTableData(queryReply, currentRow)
          currentRow += len(queryReply['rows'])
          queryReply = jobCollection.getQueryResults(
                         projectId=projectId,
                         jobId=queryReply['jobReference']['jobId'],
                         startIndex=currentRow).execute()
      else:
        print 'Waiting for the query to complete...'
        print 'Current status: ' + currentStatus
        print time.ctime()
        time.sleep(10)

  except HttpError as err:
    print 'Error in runAsyncTempTable:', pprint.pprint(err.resp)

  except Exception as err:
    print 'Undefined error: %s' % err

When you run these samples, you will notice that your job status remains PENDING for at least 30 minutes. To see how an interactive query might work, change the priority setting to INTERACTIVE or remove it altogether.

Query caching

BigQuery caches query results on a best-effort basis for increased performance. You aren't charged for cached queries, but cached queries are subject to the same quota policies as non-cached queries. BigQuery caches all queries that don't specify a destination table, including both interactive and batch queries.

Opting out of query caching

If you don't want your query to pull from the cache, you can take one of the following steps:

If using the BigQuery web UI, click the Enable Options button and ensure that Use Cached Results is unchecked.
If using the BigQuery API, set the useQueryCache property to false.

Ensuring cached query results

If you use the jobs.insert() function to run a query, you can ensure that the job returns a query result from the cache, if it exists, by setting the createDisposition property of the job configuration to CREATE_NEVER.

If the query result doesn't exist in the cache, a NOT_FOUND error returns.

Checking to see if BigQuery returned a cached result

There are two ways to determine if BigQuery returned a cached result:

If using the BigQuery web UI, the result string does not contain information about the number of processed bytes, and displays the word "cached".
If using the BigQuery API, the cacheHit property in the query result is set to true.

Returning large query results

Normally, queries have a maximum response size. If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.

Queries that return large results will take longer to execute, even if the result set is small, and are subject to additional limitations:

You must specify a destination table.
You can't specify a top-level ORDER BY clause.
Window functions can't return large query results.

Table decorators

Normally, BigQuery performs a full column scan when running a query. You can use table decorators to perform a more cost-effective query of a subset of your data.

BigQuery supports the following table decorators:

Table decorator	Description
`@<time>`	References a snapshot of the table at `<time>`, in milliseconds since the epoch. `<time>` must be within the last 24 hours.
`@-<time>`	References table data added in the last `<time>`, where `<time>` is less than 24 hours in milliseconds.
`@<time1>-<time2>`	References table data added between `<time1>` and `<time2>`, in milliseconds since the epoch. `<time1>` and `<time2>` must be within the last 24 hours.
`@<time>-`	References data added to the table after `<time>`, in milliseconds since the epoch. `<time>` must be within the last 24 hours.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies.

Google BigQuery

Feedback on this document

Querying Data

Contents

Overview

Additional limits

Synchronous queries

Java

Python

Asynchronous queries

Java

Python

Interactive and batch queries

Java

Python

Query caching

Opting out of query caching

Ensuring cached query results

Checking to see if BigQuery returned a cached result

Returning large query results

Table decorators

Authentication required

Signing you in...

Google BigQuery Feedback on this document

Querying Data

Contents

Overview

Additional limits

Synchronous queries

Java

Python

Asynchronous queries

Java

Python

Interactive and batch queries

Java

Python

Query caching

Opting out of query caching

Ensuring cached query results

Checking to see if BigQuery returned a cached result

Returning large query results

Table decorators

Authentication required

Signing you in...

Google BigQuery

Feedback on this document