BlazingDB

The BlazingDB Developer Hub

Welcome to the BlazingDB developer hub.

BlazingDB is a SQL database on graphics processing units (GPUs). This allows us to process tons of data very fast. If you want to know a little more check out our "Announcements" section.

You'll find comprehensive guides and documentation to help you start working with BlazingDB as quickly as possible, as well as support if you get stuck.

Get Started    

Community Quickstart Guide

Local Developer Edition V2.0

BlazingDB is a SQL database on graphics processing units (GPUs). Typically, we've been helping our users get BlazingDB up and running on large clusters of GPU servers on the cloud, but we want to make our engine available to more people.

On that note, check it out, have fun, and let us know what would make this experience better in the Support Channels!

Prerequisites

You will require the following prerequisites to install the BlazingDB Server:

  • A CUDA-enabled GPU with a CUDA compute capability of 3.0 or higher, check here.
  • Have successfully installed the CUDA driver. This can be... challenging, we recommend this guide.
  • Ubuntu 16.04 - more platforms to come soon.
  • Java 6 or higher for the BlazingDB Workbench Web.

2. Install BlazingDB Community Edition

BlazingDB

BlazingDB is a distributed SQL engine on GPUs. BlazingDB offers a distributed compute engine that can run off enterprise data lakes (distributed file systems). We have currently implemented HDFS and AWS Simple Storage Service (S3 ). With Apache Parquet files on data lakes, customers can deploy one or multiple nodes of BlazingDB, connect them through a few simple configuration steps, and then run multi-terabyte SQL workloads directly off of Apache Parquet. BlazingDB will distribute the query across all the nodes and GPUs are available in a configured cluster.

Typically, we've been helping our users get BlazingDB up and running on large clusters of GPU servers on the cloud, but we want to make our engine available to more people.

For more information and related downloads for BlazingDB over any architecture, please contact us https://blazingdb.com

General

The BlazingDB Docker images are based on the NVIDIA CUDA Docker images: https://hub.docker.com/r/nvidia/cuda/

So in order to use BlazingDB with Docker we need these prerequisites:

- GNU/Linux x86_64 with kernel version > 3.10
- NVIDIA GPU with Architecture > Fermi (2.1)
- NVIDIA drivers ~= 361.93 (untested on older versions)
- Docker >= 1.12
- NVIDIA Docker >= 2.0

Remember, you need to use the NVIDIA Docker runtime in order to use the BlazingDB Docker images.

The NVIDIA Docker runtime will load the driver inside the container:

NVIDIA Docker runtime

  docker run --runtime=nvidia ...
  nvidia-docker ...

nvidia-docker is a wrapper for docker run --runtime=nvidia

Install Docker

Follow the next steps to install Docker for your Linux distribution:

We need to install Docker >= 1.12

Install NVIDIA Docker

Follow the next steps to install NVIDIA Docker for your Linux distribution:

https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)

NVIDIA Docker version

We need to install NVIDIA Docker >= 2.0

BlazingDB Docker Images

Each Docker image represents an environment and all environments has the prefix "blazingdb/":

Environment Parent Description
blazingdb blazingdb/runtime Public image hosted in DockerHub (can be used for Community or generic deploy)

The supported operative systems and architectures are tied to nvidia/cuda (currently: Ubuntu, CentOS and x86_64 and powerpc)

Control GPU Visibility with NVIDIA Docker

The container environment variable NVIDIA_VISIBLE_DEVICES controls the GPU visibility. The Possible values are:

  • 0,1,2, GPU-fef8089b …: a comma-separated list of GPU UUID(s) or index(es).
  • all: all GPUs will be accessible, this is the default value in our container images.
  • none: no GPU will be accessible, but driver capabilities will be enabled.
  • void or empty or unset: nvidia-container-runtime will have the same behavior as runc.

You need to pass the environment variable when run the container:

NVIDIA Docker runtime

  1. docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 ...
  2. nvidia-docker -e NVIDIA_VISIBLE_DEVICES=0 ...

For more information about the GPU Visibility see:

https://github.com/NVIDIA/nvidia-docker/wiki/Usage
https://github.com/nvidia/nvidia-container-runtime#nvidia_visible_devices

How to use blazingdb image

In memory container

This mode will not persist the BlazingDB data (databases, tables, files, etc). All the BlazingDB data will be lost after you finish the container.

Example for Ubuntu 16.04 and CUDA 8.0

  nvidia-docker run -p 9001:9001 -p 8080:8080 blazingdb/blazingdb:ubuntu16.04-cuda8.0

Description

This container will start the Simplicity and Webapp services using Supervisord.
When you run a container based on this image you will have a BlazingDB instance up and ready for any use or testing related tasks.

Custom script to run blazingdb docker image

To run the docker image you can use nvidia-docker directly or use a custom script that will help to mount the volumes and set the ports.

Copy and paste the code below into a bash script called blazingdb-docker.sh

#!/bin/bash

# Mandatory arguments
blazingdb_directory=$1
blazingdb_image=$2
blazingdb_container=$3

# Optional arguments
blazingdb_http_port=$4
supervisord_http_port=$5

if [ -z "$blazingdb_directory" ]; then
    echo "Error: You need to provide the directory where BlazingDB container will save the data."
    exit 1
fi

if [ -z "$blazingdb_image" ]; then
    echo "Error: You need to provide the blazingdb docker image."
    exit 1
fi

if [ -z "$blazingdb_container" ]; then
    echo "Error: You need to provide the container name for the current instance."
    exit 1
fi

if [ -z "$blazingdb_http_port" ]; then
    echo "Note: By default will use the HTTP port 8080 for BlazingDB."
    blazingdb_http_port=8080
fi

if [ -z "$supervisord_http_port" ]; then
    echo "Note: By default will use the HTTP port 9001 for Supervisord."
    supervisord_http_port=9001
fi

# Config folders
blazing_bin=$blazingdb_directory/opt/blazing/bin/
blazing_conf=$blazingdb_directory/opt/blazing/conf/
blazing_perf=$blazingdb_directory/opt/blazing/perf/

# Data folders
blazing_uploads=$blazingdb_directory/opt/blazing/uploads/
blazing_workbench=$blazingdb_directory/opt/blazing/workbench/
blazing_metadata=$blazingdb_directory/mnt/blazing-metadata/
blazing_disk_cache=$blazingdb_directory/mnt/blazing-disk-cache/
blazing_drives=$blazingdb_directory/mnt/blazing-drives/
blazing_tmp=$blazingdb_directory/mnt/blazing-tmp/

# Create if not exists the config folders structure
mkdir -p $blazing_bin
mkdir -p $blazing_conf
mkdir -p $blazing_perf

# Create if not exists the data folders structure
mkdir -p $blazing_uploads/admin/
mkdir -p $blazing_uploads/default/
mkdir -p $blazing_workbench/tmp/
mkdir -p $blazing_metadata/default/
mkdir -p $blazing_disk_cache/default/
mkdir -p $blazing_drives/drive1/default/
mkdir -p $blazing_drives/drive2/default/
mkdir -p $blazing_tmp

# Setup the Workbench if we are running for first time.
if [ ! -f $blazing_workbench/data.mv.db ]; then
    echo "Downloading basic workbench database ... User 'admin' and password '123456'"
    wget https://s3-us-west-2.amazonaws.com/blazing-public-downloads/installer/2.0/data.mv.db -O ${blazing_workbench}/data.mv.db
    wget https://s3-us-west-2.amazonaws.com/blazing-public-downloads/installer/2.0/data.trace.db -O ${blazing_workbench}/data.trace.db
fi

nvidia-docker run -d \
                  --name $blazingdb_container \
                  -p $blazingdb_http_port:8080 \
                  -p $supervisord_http_port:9001 \
                  -v $blazing_uploads:/opt/blazing/uploads/:z \
                  -v $blazing_workbench:/opt/blazing/workbench/:z \
                  -v $blazing_metadata:/mnt/blazing-metadata/:z \
                  -v $blazing_disk_cache:/mnt/blazing-disk-cache/:z \
                  -v $blazing_drives:/mnt/blazing-drives/:z \
                  -v $blazing_tmp:/mnt/blazing-tmp/:z \
                  $blazingdb_image

Then made the script executable:

Assign execute permission to blazingdb-docker.sh

chmod +x blazingdb-docker.sh

Volumes

Volume Type Description
/opt/blazing/bin/ Binaries Contains Simplicity and blazing-workbench.jar
/opt/blazing/conf/ Configuration Contains blazing.conf, nodes.config and driver.properties
/opt/blazing/perf/ Logging If the Simplicity binary is for performance then this directory will have the log (See PERF_LOGGING_FOLDER in blazing.conf)
/opt/blazing/uploads/ Webapp Data Where the user can upload files (CSV, Parquet, etc.)
/opt/blazing/workbench/ Webapp Data Contains the Webapp embedded database (data.mv.db and data.trace.db) and the tmp directory (where jar will be decompressed)
/mnt/blazing-metadata/ Simplicity Data This is the BASE_FOLDER in the blazing.conf
/mnt/blazing-disk-cache/ Simplicity Data This is the DISK_CACHE_LOCATION in blazing.conf
/mnt/blazing-drives/ Simplicity Data This is the SEQUENTIAL_DRIVES in blazing.conf
/mnt/blazing-tmp/ Simplicity Data This is the BLAZING_TEMP_FOLDER in blazing.conf

Data volume

If you need to persist the BlazingDB data (using the docker containers) then is necessary to mount all the data volumes/directories.

Ports

Port Type Description
8080 HTTP HTTP port for the Webapp
9001 HTTP HTTP port for Supervisord

Persistent container (with script)

This mode will persist the BlazingDB data (databases, tables, files, etc) in a selected host directory. To run this mode you can use the blazingdb-docker.sh script.

blazingdb-docker.sh

This script will ensure to create (if not exists) the directory structure required (before mount the volumes). The script will also download (if not exists) a clean Webapp embedded database (data.mv.db and data.trace.db) from S3:

https://s3-us-west-2.amazonaws.com/blazing-public-downloads/installer/2.0/data.mv.db
https://s3-us-west-2.amazonaws.com/blazing-public-downloads/installer/2.0/data.trace.db

Arguments

./blazingdb-docker.sh $host_directory \
                      $blazingdb_image \
                      $container_name \
                      $blazingdb_http_port \
                      $supervisord_http_port

Where:

- $host_directory, $blazingdb_image and $container_name are mandatory  arguments.
- $blazingdb_http_port and $supervisord_http_port are optional arguments.
- $blazingdb_http_port is set to **8080** by default.
- $supervisord_http_port is set to **9001** by default.

You start your blazingdb container with the custom script:

Example for Ubuntu 16.04 and CUDA 8.0 (with default HTTP ports)

./blazingdb-docker.sh /path/to/host/directory/ \
                      blazingdb/blazingdb:ubuntu16.04-cuda8.0 \
                      blazingdb-container-1

This script with the different ports :

Example for Ubuntu 16.04 and CUDA 8.0 (with custom HTTP ports)

./blazingdb-docker.sh /path/to/host/directory/ \
                      blazingdb/blazingdb:ubuntu16.04-cuda8.0 \
                      blazingdb-container-1 \
                      8081 \
                      9003

where:

  • /path/to/host/directory : blazingdb directory, where the volumes will be create
  • blazingdb/blazingdb:ubuntu16.04-cuda8.0: blazingdb image name
  • blazingdb-container-1: blazingdb container name

Persistent container (without the script)

This mode will show how to mount all the data volumes/directories and use all the ports directly with docker:

nvidia-docker run --name $blazingdb_container \
                  -p $blazingdb_http_port:8080 \
                  -p $supervisord_http_port:9001 \
                  -v $blazing_uploads:/opt/blazing/uploads/:z \
                  -v $blazing_workbench:/opt/blazing/workbench/:z \
                  -v $blazing_metadata:/mnt/blazing-metadata/:z \
                  -v $blazing_disk_cache:/mnt/blazing-disk-cache/:z \
                  -v $blazing_drives:/mnt/blazing-drives/:z \
                  -v $blazing_tmp:/mnt/blazing-tmp/:z \
                  $blazingdb_image

Structure for local directories

Before mount, the volumes need to have the following directory structure:

  • /opt/blazing/bin/
  • /opt/blazing/conf/
  • /opt/blazing/perf/
  • /opt/blazing/uploads/admin/
  • /opt/blazing/uploads/default/
  • /opt/blazing/workbench/tmp/
  • /mnt/blazing-metadata/default/
  • /mnt/blazing-disk-cache/default/
  • /mnt/blazing-drives/drive1/default/
  • /mnt/blazing-drives/drive2/default/
  • /mnt/blazing-tmp/

And the volume /opt/blazing/workbench/ needs to have the Webapp database: data.mv.db and data.trace.db.

You can download a clean database from S3:

https://s3-us-west-2.amazonaws.com/blazing-public-downloads/installer/2.0/data.mv.db
https://s3-us-west-2.amazonaws.com/blazing-public-downloads/installer/2.0/data.trace.db

Example for Ubuntu 16.04 and CUDA 8.0

nvidia-docker run --name blazingdb-container-1 \
                  -p 8081:8080 \
                  -p 9003:9001 \
                  -v /my/blazing_uploads:/opt/blazing/uploads/:z \
                  -v /my/blazing_workbench:/opt/blazing/workbench/:z \
                  -v /my/blazing_metadata:/mnt/blazing-metadata/:z \
                  -v /my/blazing_disk_cache:/mnt/blazing-disk-cache/:z \
                  -v /my/blazing_drives:/mnt/blazing-drives/:z \
                  -v /my/blazing_tmp:/mnt/blazing-tmp/:z \
                  blazingdb/blazingdb:ubuntu16.04-cuda8.0

Advanced mode (custom BlazingDB binaries and configuration)

You can mount mount volumes/directories where contains custom version of BlazingDB binaries and custom setting files:

Generic syntax for custom binaries and conf (in memory container)

nvidia-docker run --name $blazingdb_container \
                  -p $blazingdb_http_port:8080 \
                  -p $supervisord_http_port:9001 \
                  -v $blazing_bin:/opt/blazing/bin/:z \
                  -v $blazing_conf:/opt/blazing/conf/:z \
                  -v $blazing_perf:/opt/blazing/perf/:z \
                  $blazingdb_image

Generic syntax for custom binaries and conf (persistent container)

nvidia-docker run --name $blazingdb_container \
                  -p $blazingdb_http_port:8080 \
                  -p $supervisord_http_port:9001 \
                  -v $blazing_bin:/opt/blazing/bin/:z \
                  -v $blazing_conf:/opt/blazing/conf/:z \
                  -v $blazing_perf:/opt/blazing/perf/:z \
                  -v $blazing_uploads:/opt/blazing/uploads/:z \
                  -v $blazing_workbench:/opt/blazing/workbench/:z \
                  -v $blazing_metadata:/mnt/blazing-metadata/:z \
                  -v $blazing_disk_cache:/mnt/blazing-disk-cache/:z \
                  -v $blazing_drives:/mnt/blazing-drives/:z \
                  -v $blazing_tmp:/mnt/blazing-tmp/:z \
                  $blazingdb_image

Exec the blazingdb/blazingdb Docker Image

Once the container is running you can use the nvidia-docker exec command in order to get into the container shell (this is mostly for debugging purposes):

nvidia-docker exec -ti blazingdb-container-1 /bin/bash

Docker Tips

Stop and remove all the containers

docker stop $(docker ps -aq)
docker rm $(docker ps -aq)

Supervisord

Supervisor is a client/server system that allows its users to control a number of processes on UNIX-like operating systems. Supervisord will manage these 2 services inside the container, you can see the logs of each services if supervisord is up.

Service Service name Description
Simplicity blazing The Simplicity process.
Webapp blazing-webapp The Webview process (using the blazing-workbench.jar)

Dashboard (HTTP frontend)

If you didn't change the $supervisord_http_port number (at the moment to run the container) then go to http://localhost:9001/ to manage the services. If you are using a custom $supervisord_http_port number then replace your custom port number in the previous URL.

Command Line Interface (with nvidia-docker run)

Example for Ubuntu 16.04 and CUDA 8.0

  nvidia-docker exec -ti blazingdb-container-1 supervisorctl status all
  nvidia-docker exec -ti blazingdb-container-1 supervisorctl pid blazing
  nvidia-docker exec -ti blazingdb-container-1 supervisorctl stop blazing-webapp
  nvidia-docker exec -ti blazingdb-container-1 supervisorctl start blazing-webapp
  nvidia-docker exec -ti blazingdb-container-1 supervisorctl restart blazing
  nvidia-docker exec -ti blazingdb-container-1 supervisorctl restart all

More information: http://supervisord.org/running.html#running-supervisorctl

Command Line Interface

Description

The command line interface (CLI) is the terminal client for a BlazingDB instance. This tool allows to execute queries without the need of the Webview.

The CLI tool is blazingdb.sh, this script is only a wrapper for the Python module ServerManagement.BlazingDatabaseManager (in the project BlazingEngineIntegrationTests).

blazingdb.sh

#!/bin/bash
BlazingEngineIntegrationTests_directory=/opt/blazingdb/PackageBlazingEngineIntegrationTests
cd $BlazingEngineIntegrationTests_directory
python -m ServerManagement.BlazingDatabaseManager "$@"

Note

The CLI client is very useful when you cannot use the web client.

Using the BlazingDB CLI

BlazingDB connection

The JSON file blazingConnection.json has the connection arguments that the CLI will use in order to connect with a BlazingDB instance.

The connection arguments are:

Argument Service name Mandatory
endpoint URL of the Webapp Yes
username BlazingDB username. Yes
password BlazingDB password. Yes
database A database which the CLI will use to run the queries. No

Examples

Create blazingConnection.json file with following content

  {
    "endpoint": "http://localhost:8080",
    "username": "admin",
    "password": "123456",
    "database": "testdb"
  }

You can run these commands inside the container to test the BlazingDB CLI.

Help

blazingdb.sh --help
blazingdb.sh -c $PWD/blazingConnection.json database --help
blazingdb.sh -c $PWD/blazingConnection.json query --help

Where is your blazingConnection.json file is located, execute the following:

CLI usage examples

  # **Create a database**
    blazingdb.sh -c $PWD/blazingConnection.json database "mydatabase"

  # **List tables**
    blazingdb.sh -c $PWD/blazingConnection.json query -d "mydatabase" "list tables"

  # **Create table**
    blazingdb.sh -c $PWD/blazingConnection.json query -d "mydatabase" "create table myregion (r_regionkey short, r_name string(32))"

  # **Drop database**
    blazingdb.sh -c $PWD/blazingConnection.json query -d "mydatabase" "drop database mydatabase"

Run validation test

The purpose of the validation tests is to validate the BlazingDB installation.

Run the tests inside the container

cd /opt/blazing/PackageBlazingEngineIntegrationTests/
python -m DeploymentValidation.DeploymentValidationTest JsonFiles/configTestRelease.json

The result should be:

**********************************************************************
SUMMARY for test suite Create Release Test: 40/40 PASSED, 0/40 FAILED
Tests elapsed time for the test suite: 13.372891 seconds
Execution time for Blazing queries for the test suite: 7.283900 seconds
Execution time for Postgres queries for the test suite: 2.232965 seconds
Delta execution time between Postgres and Blazing for the test suite: -5.050936 seconds
Tests total elapsed time: 20.025028 seconds
Total execution time for Blazing queries: 7.283900 seconds
Total execution time for Postgres queries: 2.232965 seconds
Delta execution time between Postgres and Blazing: -5.050936 seconds
**********************************************************************

For more information please check https://hub.docker.com/u/blazingdb/

Community Quickstart Guide

Local Developer Edition V2.0