Streaming Data Into BigQuery

Instead of using a job to load data into BigQuery, you can choose to stream your data real-time into BigQuery by using the tabledata().insertAll() method. There are trade-offs when working with streaming inserts; for more information, see streaming insert use cases.

Access control
Quota policy
Data availability
Streaming insert use cases
Troubleshooting streaming inserts
Streaming insert examples

Access control

Streaming data into BigQuery requires the following access levels.

Product	Access
BigQuery	`WRITE` access for the dataset that contains the destination table. For more information, see access control.

Quota policy

The following limits apply for streaming data into BigQuery.

Maximum row size: 100 KB
Maximum data size of all rows, per insert: 1 MB
Maximum rows per second: 100 rows per second, per table, with allowed and occasional bursts of up to 1,000 rows per second. If you exceed 100 rows per second for an extended period of time, throttling might occur.

Data availability

The first time a streaming insert occurs, the streamed data is inaccessible for a warm-up period of up to two minutes. After the warm-up period, all streamed data added during and after the warm-up period is immediately queryable. After several hours of inactivity, the warm-up period will occur again during the next insert.

Use cases

There are several trade-offs when using streaming inserts:

Faster access to your data
Real-time insertions; no more need to batch your data
Possible data duplication
Possible connection drops

For these reasons, we recommend using streaming inserts for any of the following use cases.

High volume event logging

If you have an app that collects a large amount of data in real-time, streaming inserts are a good choice.

Real-time dashboards and queries

You can take a few precautions to ensure that you'll be able to perform analysis over transactional data, and also have an up-to-the-second view of your data:

Create two tables with an identical schema. The first table is for the reconciled data, and the second table is for the real-time, unreconciled data.
On the client side, maintain a transactional data store for records.
Fire-and-forget insertAll() requests for these records. The insertAll() request should specify the real-time, unreconciled table as the destination table.
At some interval, append the reconciled data from the transactional data store and truncate the unreconciled data table.
For real-time dashboards and queries, you can select data from both tables. The unreconciled data table might include duplicates or dropped records.

We recommend using other ways to load data into BigQuery for any data that can't tolerate loss or duplication, and doesn't require immediate availability for queries.

Troubleshooting streaming inserts

Failure HTTP response codes

If you receive a failure HTTP response code such as a network error, there's no way to tell if the streaming insert succeeded. If you try to simply re-send the request, you might end up with duplicated rows in your table. To help protect your table against duplication, set the insertId property when sending your request. BigQuery will attempt to reduce duplication on a best-effort basis.

If you receive a permission error, an invalid table name error or an exceeded quota error, no rows are inserted and the entire request fails.

Success HTTP response codes

Even if you receive a success HTTP response code, you'll need to check the insertErrors property of the response to determine if the row insertions were successful.

If BigQuery encountered errors processing individual rows of the request, such as a schema mismatch, none of the rows are inserted and an insertErrors entry is returned for each row. If the errors property of the insertErrors entry is empty, there was no problem with the row. Otherwise, the errors property contains detailed information about why the row caused an error.

Streaming insert examples

Java

This sample uses the Google APIs Client Library for Java.

TableRow row = new TableRow();
row.set("column_name", 7.7);
TableDataInsertAllRequest.Rows rows = new TableDataInsertAllRequest.Rows();
rows.setInsertId(timestamp);
rows.setJson(row);
List  rowList =
    new ArrayList();
rowList.add(rows);
TableDataInsertAllRequest content = 
    new TableDataInsertAllRequest().setRows(rowList);
TableDataInsertAllResponse response =
    bigquery.tabledata().insertAll(
        projectId, datasetId, tableId, content).execute();

Python

This sample uses the Google APIs Client Library for Python.

body = {"rows":[
    {"json": {"column_name":7.7,}}
    ]}
response = bigquery.tabledata().insertAll(
    projectId=PROJECT_ID,
    datasetId=DATASET_ID,
    tableId=TABLE_ID,
    body=body).execute()

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies.

Google BigQuery

Feedback on this document

Streaming Data Into BigQuery

Contents

Access control

Quota policy

Data availability

Use cases

High volume event logging

Real-time dashboards and queries

Troubleshooting streaming inserts

Failure HTTP response codes

Success HTTP response codes

Streaming insert examples

Java

Python

Authentication required

Signing you in...

Google BigQuery Feedback on this document

Streaming Data Into BigQuery

Contents

Access control

Quota policy

Data availability

Use cases

High volume event logging

Real-time dashboards and queries

Troubleshooting streaming inserts

Failure HTTP response codes

Success HTTP response codes

Streaming insert examples

Java

Python

Authentication required

Signing you in...

Google BigQuery

Feedback on this document