Skip to main content

Google BigQuery

Google BigQuery is a fully-managed, server less data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL.It was designed for analyzing data on the order of billions of rows, using a SQL-like syntax. It runs on the Google Cloud Storage infrastructure and can be accessed with a REST-oriented application program interface (API).

To import tables from BigQuery, you first need to configure Google Cloud. Here’s how to do this:

Enable BigQuery API

  1. Log in to Google Cloud Platform and select the project to use in Mammoth;
  2. Go to API and Services from the sidebar;
  3. Click on + Enable APIs and Services and look for BigQuery API;
  4. Open BigQuery API and click on Enable;

Enabling BigQuery API

Create a Service Account

  1. From the sidebar, select IAM & Admin and go to Service accounts;
  2. Click on Create service account;
  3. In Service Account Details, enter the service account name and description. Google Cloud will then choose an email for your service account randomly. Click on Create;

Creating a service account

  1. Click into Grant this service account access to project, add following roles
  • BigQuery Data Editor
  • BigQuery Job User
  • Storage Admin

to give the service account permission to complete specific actions on the resources in your project. Click Continue;

note

Data pull from big query requires that big query pushes the exported data to storage buckets. Mammoth needs Storage Admin permission to create temporary buckets for data export that are then imported into Mammoth. These buckets are deleted after operation is successfully completed.

  1. In Grant users access to this service account, add the email address of the end user. By default, you will be the sole owner of the service account. This step is optional. Click Done;

Adding roles to your service account

  1. For the newly created service account, go to the action menu and select Manage keys;
  2. Select Add key and click on Create new key.
  3. Select the key type as JSON (Mammoth accepts JSON type keys). Click on Create and a JSON key will be downloaded into your system. This JSON key will be required while connecting Mammoth to BigQuery.

Creating a JSON key

note

A service account can be used for multiple projects.

  1. Switch to the project which needs this service account;
  2. From the sidebar, go to IAM & Admin and click on +ADD;
  3. Enter the email of the service account, create the roles and save the file. Adding a service account to a project

Connecting to Mammoth

  1. Select API & Databases from the 'Add Data' menu and click on Google BigQuery.
  1. Click on New Connection and Upload the JSON key which was downloaded in the previous step. CLick on Connect.

Google BigQuery login

  1. Select the desired ProjectID - Dataset and click on Next.

Google BigQuery Profile

  1. Once your Google account is connected with Mammoth, you will be presented with a list of tables and views in that database.
  • Select the desired table to get a preview.
  • Write a SQL query or run a test query and preview the result.
  • Click on Next.

After you have selected the table you want to work on, you get options to schedule data imports as discussed in the next section.

Scheduling your Data Pulls

You can start retrieving the data now or at a specific time. Further schedule the data imports to get the latest data from your Database at a certain time interval - just once, daily, weekly or monthly.

On every data pull from your Database, you also have an option to either replace the older data or combine with older data.

On choosing Combine with older data option, you will get an option to choose a unique sequence column. Using this column, on refresh, Mammoth will pick up all the rows that have greater value in this column than the previous data pull.