Site icon Shaharia's Blog

Copy BigQuery Dataset To Another Dataset – Copy All Tables

BigQuery copy datasets multiple table at once

BigQuery copy datasets multiple table at once

BigQuery is one of the greatest products in Google Cloud Platform. Basically it is useful to analysis massive amount of data for analytical purpose. You can run queries, get insights and analytical reports from terabyte, petabyte amount of data within very short time that your traditional database can’t handle. That’s why it’s a popular choice among data scientists. As you can create multiple BigQuery dataset in a single project and a single dataset can have multiple table. So sometimes it’s very essential to copy BigQuery dataset from one project to another. To achieve that you have to learn about BigQuery cp command (bq cp).

Moving BigQuery Dataset across GCP project

In Google Cloud Platform a.k.a GCP most of the businesses maintain multiple projects. Assume, you are storing large amount of application log in BigQuery inside Project A. But now you need that BigQuery dataset in your another project. Simply you can do that with IAM but you will face some problems where you need to move the dataset to and from another project. Google itself didn’t cover this migration topics in their website, I just mixed up resources to make one to save my life.

You can move or copy BigQuery dataset to another project if both of your project’s BigQuery datacenter is in the same region.

BigQuery copy dataset to another project

Here is the main thing. Somehow you need to copy. But remember, both of your project’s BigQuery dataset should be in the same datacenter region. That means if your dataset from Project A stored in US, then you can move that dataset to another project if that project’s dataset stored also in US. You can copy, clone or move the dataset with two different way. One is with “bq cp” command by writing a small shell script. Another way is to do that one by one manually from BigQuery GCP console UI. Here I am going to show you the both way.

Copy all tables at once with shell script

If your Big Query dataset has lots of tables, then it’s very time consuming to clone or copy all tables one by one. You know that, right? If you try to copy table from one dataset to another manually, you need to spent a lots of time depending on the number of total tables for that dataset.

But for the help of Shell script, we can combine few “bq” command and can make our life easier for this types of bulk BigQuery table copy task between dataset for same project or for another project. I just googled to get similar solutions and combined few things in a single shell script.

Here is the script to copy or clone the entire Big Query dataset in another dataset using bq cp command.

Shell Script to copy Big Query Dataset

#!/bin/sh
export SOURCE_DATASET="BQPROJECTID:BQSOURCEDATASET"
export DEST_PREFIX="TARGETBQPROJECTID:TARGETBQDATASET._YOUR_PREFIX"
for f in `bq ls -n TOTAL_NUMBER_OF_TABLES $SOURCE_DATASET |grep TABLE | awk '{print $1}'`
do
  export CLONE_CMD="bq --nosync cp $SOURCE_DATASET.$f $DEST_PREFIX$f"
  echo $CLONE_CMD
  echo `$CLONE_CMD`
done

In the above command –nosync will help you to load bq job asynchronously. So you don’t need to wait for current job to finish. Also TOTAL_NUMBER_OF_TABLES you need to replace with existing dataset’s total number of tables you want to copy. i.e: 200

Explain shell scripts for “bq cp”

The above shell script is basically just fetching all the existing table from source datasets. And by running “bq cp” loop in shell script we are generating BigQuery copy job for all the tables. It’s pretty straight forward.

Now, run the above shell script and start kicking the bq job payload. Keep watching the terminal. It will display output as it run one by one automatically. Basically it’s all about the power of bq cp command.

Copy Dataset from GCP console UI

Before moving ahead, make sure you have the editor permission for both project and both dataset. Ok. Now move ahead. Create a dataset in your destination project and make sure the dataset storage location is the same as your source dataset. Otherwise you can’t copy dataset between projects. I wasted 2 hours to simply realize that.

If both dataset storage location is same, now from the BigQuery console UI, open dataset, and there you will get an option to “Copy Table”.

BigQuery GCP console interface

Now, follow the screenshot below. Define your source and destination and it will start the BigQuery job.

But you can’t do this for all table at once. You have to do that for every single table. So it’s better to use the shell script mechanism I mentioned above. I had once an experience to copy 2500 tables and from that event I wrote this article about bq copy tables. For that task I spent more than 2 days and it was really really a bad weekend for me. To understand more details about bq cp command you can see Google BigQuery command reference. But they didn’t write this exact reference to copy dataset between table.

Write down your comments if everything works perfectly. It will help others to try this out. If you face any trouble or if it doesn’t work completely for you, also write that in comments so I can help it out and update the article. Click here for more BigQuery articles.

Read more articles on BigQuery

Copy BigQuery Dataset To Another Dataset – Copy All Tables

BigQuery is one of the greatest products in Google Cloud Platform. Basically it is useful to analysis massive amount of data for analytical purpose. You can run queries, get insights and analytical reports from terabyte, petabyte amount of data within very short time that your traditional database can’t handle. That’s why it’s a popular choice among data scientists. As you can create multiple BigQuery dataset in a single project and a single dataset can have multiple table. So sometimes it’s very essential to copy BigQuery dataset from one project to another. To achieve that you have to learn about BigQuery cp command (bq cp). Moving BigQuery Dataset across GCP project In Google Cloud Platform a.k.a GCP most of the businesses maintain multiple projects. Assume, you are storing large amount of application log in BigQuery inside Project A. But now you need that BigQuery dataset in your another project. Simply you can do that with IAM but you will face some problems where you need to move the dataset to and from another project. Google itself didn’t cover this migration topics in their website, I just mixed up resources to make one to save my life. You can move or copy…