BigQuery Tips & Tricks [part – 1]

BigQuery Tips & Tricks

BigQuery Tips & Tricks – Part 1

I am becoming a fan of BigQuery recent days. It’s saving me a lots of time while of our business platforms are running on Google Cloud Platform. Every application, every operation we are keeping logs from every corner of our infrastructure as much as possible that gives us a greater good insight about everything. All logs are being stored in BigQuery. It’s awesome, completely automated and scaled.

With Google’s Data Studio, our project managers, business analysts and every other department (with or without any technical knowledge) can build their own reports and get more insight about every piece of components that’s making our business great.

Google Data Studio & BigQuery

So sometimes, I need to be able to practice some tips and tricks to get more complex insight, data transformation and other small stuffs to make the whole ecosystem more useful. So this articles I am keeping here for my own practice purpose so whenever I forgot any little things, I can find that from here. This BigQuery tips and tricks will be also helpful for you.

Convert timezone in query

All the BigQuery timestamps are in UTC timezone. If you want to change the timestamp field in another timezone. Just follow this

SELECT DATETIME(timestamp, "Asia/Dhaka") from table

Parse and Access JSON string with JSON Path

Assume, you have some logs or some data in JSON format and you want to access properties from inside the JSON string. You should use JSON_EXTRACT() or JSON_EXTRACT_SCALAR() function.

SELECT JSON_EXTRACT('{"job_slugs":"I am a test Job slug"}', '$.job_slugs') as job_slugs from table

It can work based on JSON path. You can evaluate your JSON path from http://jsonpath.com. It’s awesome

Remove double quotes from beginning and end of the string

Let’s say, you extracted some data and it’s looks like ” String inside double quotes”. Now it’s making you trouble to work with that string. Simply follow this

SELECT
 REGEXP_REPLACE('"String inside double quotes"', r'^"|"$', '') AS unquoted_string from table

And you will get String inside double quotes without the double quotes. It’s a simple tricks with REGEX_REPLACE() bigquery function.

Also read my other BigQuery related posts from here

Copy BigQuery dataset to another dataset – copy all tables at once

Sometime it’s necessary to copy or clone the entire Big Query dataset to another dataset. If you Big Query dataset has lots of table, then it’s very time consuming to clone or copy all tables one by one.

But for the help of Shell script, we can combine few `bq`  command and can make our life easier for this types of bulk tasks. I just googled to get similar solutions and combined few things in a single shell script.

Here is the script to copy or clone the entire Big Query dataset in another dataset.

Shell Script to copy Big Query Dataset

#!/bin/sh
export SOURCE_DATASET="BQPROJECTID:BQSOURCEDATASET"
export DEST_PREFIX="TARGETBQPROJECTID:TARGETBQDATASET._YOUR_PREFIX"
for f in `bq ls -n TOTAL_NUMBER_OF_TABLES $SOURCE_DATASET |grep TABLE | awk '{print $1}'`
do
  export CLONE_CMD="bq --nosync cp $SOURCE_DATASET.$f $DEST_PREFIX$f"
  echo $CLONE_CMD
  echo `$CLONE_CMD`
done

In the above command –nosync will help you to load bq job asynchronously. So you don’t need to wait for current job to finish. Also TOTAL_NUMBER_OF_TABLES you need to replace with existing dataset’s total number of tables you want to copy. i.e: 200

Now, run the above shell script and start kicking the bq job payload.

GeoIP alternative to get IP location – Google App Engine PHP Recipes

Today’s recipes is all about to get very basic information about your visitor’s IP address location. Unfortunately, in Google App Engine, PHP GeoIP extension is not enabled in their standard environment. So sometimes, it’s essential for application developer to get the visitor’s IP address location.

Here, you can easily get user’s IP address location with 3 information easily and by default from Google App Engine. In GAE, if you will discover that Google app engine itself added few extra _X_headers in $_SERVER super-global variable. I am dumping the $_SERVER variable here.

<?php
//Output of $_SERVER

/*[
    'APPLICATION_ID' => 'YOUR GAE APPLICATION ID',
    'AUTH_DOMAIN' => 'gmail.com',
    'CURRENT_MODULE_ID' => 'default',
    'CURRENT_VERSION_ID' => '20170319t185047.399949465273621812',
    'DEFAULT_VERSION_HOSTNAME' => 'xxxxxxxxxx.appspot.com',
    'DOCUMENT_ROOT' => '/base/data/home/apps/xxxxxxxxxxxxxx/20170319t185047.399949465273621812',
    'HTTPS' => 'on',
    'HTTP_ACCEPT' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*\/*;q=0.8',
    'HTTP_ACCEPT_LANGUAGE' => 'en-US,en;q=0.8,bn;q=0.6,ms;q=0.4,pt;q=0.2',
    'HTTP_HOST' => 'xxxxxxxxxx.appspot.com',
    'HTTP_UPGRADE_INSECURE_REQUESTS' => '1',
    'HTTP_USER_AGENT' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36',
    'HTTP_X_APPENGINE_CITY' => 'dhaka',
    'HTTP_X_APPENGINE_CITYLATLONG' => '23.709921,90.407143',
    'HTTP_X_APPENGINE_COUNTRY' => 'BD',
    'HTTP_X_APPENGINE_REGION' => '?',
    'HTTP_X_CLOUD_TRACE_CONTEXT' => 'xxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxx;o=x',
    'INSTANCE_ID' => 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    'PATH_TRANSLATED' => '/base/data/home/apps/xxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxx/index.php',
    'PHP_INI_SCAN_DIR' => '/base/data/home/apps/xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxx/',
    'PHP_SELF' => '/index.php',
    'REMOTE_ADDR' => 'xxx.xxx.xxx.xxx',
    'REQUEST_ID_HASH' => 'XXXXXXXX',
    'REQUEST_LOG_ID' => 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    'REQUEST_METHOD' => 'GET',
    'REQUEST_URI' => '/',
    'SCRIPT_FILENAME' => '/base/data/home/apps/xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/index.php',
    'SCRIPT_NAME' => '/index.php',
    'SERVER_NAME' => 'xxxxxxxxxxxxxxxxxxx.appspot.com',
    'SERVER_PROTOCOL' => 'HTTP/1.1',
    'SERVER_SOFTWARE' => 'Google App Engine/1.9.48',
    'USER_IS_ADMIN' => '0',
    'REQUEST_TIME_FLOAT' => 148992.7,
    'REQUEST_TIME' => 14899,
    'argv' =>
        array(),
    'argc' => 0,
];*/


//Get the visitor's IP details
$visitorIp = [
    'ip' => $_SERVER['REMOTE_ADDR'],
    'HTTP_X_APPENGINE_CITY' => $_SERVER['HTTP_X_APPENGINE_CITY'] ?: null,
    'HTTP_X_APPENGINE_CITYLATLONG' => $_SERVER['HTTP_X_APPENGINE_CITYLATLONG'] ?: null,
    'HTTP_X_APPENGINE_COUNTRY' => $_SERVER['HTTP_X_APPENGINE_COUNTRY'] ?: null,
    'HTTP_X_APPENGINE_REGION' => $_SERVER['HTTP_X_APPENGINE_REGION'] ?: null
];

 

And from that, we can easily get the visitor’s City, City-level lat long, country and region that primary will help you to get visitor’s location.

Connect Google Cloud SQL from Local Machine using Unix Socket

Firstly,
You need to install Google Cloud SDK and check if you have gcloud command available in terminal. After installing gcloud you need to configure gcloud sdk by running gcloud init command.

Secondly,
I am assuming, you have configured your Google Cloud SQL instance for your project. If yes, then continue, otherwise you need to read “Google Cloud SQL Quickstarter Guide“.

Now Download cloud_sql_proxy.linux.amd64 package in your local machine with

"sudo wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64"

this command.

Thirdly,
Rename by mv cloud_sql_proxy.linux.amd64 cloud_sql_proxy
and give execution permission by chmod +x cloud_sql_proxy

Fourth,
Create application default credentials using “gcloud auth application-default login” command

Fifth,
Create a directory sudo mkdir -p /cloudsql for Google Cloud SQL proxy unix socket and make that directory writable sudo chmod 777 /cloudsql

Sixth,
start sql socket proxy by ./cloud_sql_proxy -dir=/cloudsql &

At last,
Now you can have working mysql DSN “mysql:unix_socket=/cloudsql/INSTANCE_CONNECTION_NAME” which you can use in your application and your application will work with Google Cloud SQL instance directly from local machine.

Please ask me anything about this post in comment below. I will be glad to assist you happily.

 

Stop and Remove all docker containers and images

Some important commands for Docker containers and images. It’s useful for delete all docker containers and images altogether at once.


# Stop all docker containers
docker stop $(docker ps -a -q)

#Remove all docker containers
docker rm $(docker ps -a -q)

# Delete all images
docker rmi $(docker images -q)