A Fast Track to Machine Learning with GPU, on Oracle Cloud

How to easily provision an environment with GPU, H2O4GPU, TensorFlow, PyTorch, on Oracle Cloud.

7 min readFeb 7, 2020

Introduction

If you want to develop a Deep Learning model, or maybe use a Machine Learning framework like H2O, most certainly you will get great benefits from using an environment with a GPU.

But, it is not always easy the setup of such an environment. You need to find all the right versions for Nvidia drivers, TensorFlow, and all the other Open Source packages and tools and you risk to spend a lot of time, simply trying to figure out the correct and compatible versions.

In a previous article, I have described all the steps needed to set up a VM in Oracle Cloud, using Ubuntu 18 LTS.

Now, there is an even easier way using an HPC VM template, available in Oracle Cloud Marketplace.

In this article, I will show you how to set up such an environment and how to install a powerful GPU-enabled framework like H2O4GPU. Besides, I’ll illustrate some tests I have executed, exploring the reduction of training time due to GPU.

The new GPU image in Oracle Cloud.

You can create a Virtual Machine directly from the Cloud Marketplace. The icon is the one shown above. You only need to create a Virtual Cloud Network (VCN), where to host the VM.

VM creation and startup will take around 10 minutes. I have tested it using a VM.GPU 2.1 shape. The shape provides you with one Tesla P100 GPU and 12 OCPU. If you need more power, you can use also BM.GPU 2.2, with 2 GPUs.

After that, you can directly access, through ssh, to a VM fully equipped with:

Oracle Linux 7
Cuda version 10.1
NVidia drivers
Python Anaconda distribution
an Anaconda environment, named “sandbox”, where all the needed packages have been installed
TensorFlow 2.1, with GPU support
Pandas
Scikit-learn
Jupyter Notebook
many more useful packages for Data Science

Besides, there is more: PyTorch 1.3 is there installed if you prefer it to TensorFlow.

Start a Notebook with TLS support.

Jupyter Notebook is already installed, equipped with a self-signed certificate to enable access using TLS.

You need only to set up a password:

[opc@ml05gpu ~]$ jupyter notebook passwordEnter password:

and then start Jupyter:

[opc@ml05gpu ~]$ jupyter notebook — certfile=jupyter-cert.pem — keyfile=jupyter-key.key

The default port (8888) is already open in the firewall. You need only to define a security rule for network access, in the VCN, as specified in the provided documentation.

After that, you can start developing your Deep Learning and Machine Learning models in the Jupyter Notebook environment.

Some quick initial tests.

You can check that the GPU is correctly set up at the OS level with this command:

[opc@ml05gpu ~]$ nvidia-smi

The output of the command shows the number and type of GPUs, the available memory (16 GB) and if processes are running on them. Good to see what’s happening when Python code is running.

After that, you can easily check that TensorFlow can work using the GPU. Simply, run this block of code inside a Notebook cell:

import tensorflow as tfif tf.test.gpu_device_name():
    print('GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
    print("Any troubles with GPU?")

Now, let’s go with H2O4GPU.

First of all, H2O is a popular Open Source framework for Machine Learning (ML). It provides many different ML algorithms. But, the most important thing is that it provides a performant and scalable implementation, capable of running efficiently on many cores, on a cluster (distributed ML) and on Hadoop (Sparkling Water). H2O implementation for GPU is called H2O4GPU.

Even if Scikit-learn is the most popular ML framework, H2O supports all the most innovative algorithms (Gradient Boosting, Distributed Random Forest, XGBoost, Stacked Ensamble, …) and can run efficiently on very large datasets.

I have decided to install H2O4GPU on my VM, to verify how easy the installation is and to check the performances, compared to an environment where ML code is running on CPU.

The installation steps are not so difficult. First of all, I have decided to create a separate “conda environment”, to avoid any conflicts with the existing (sandbox) environment. It is a best practice, especially if you’re trying something new and you don’t know how easy will be the voyage.

conda create -n h2o4gpu -c h2oai -c conda-forge h2o4gpu-cuda10

The next part is crucial to set up an environment compatible with the CUDA version (10.1).

First, activate the environment:

conda activate h2o4gpu

Next steps are needed to solve an incompatibility between H2O4GPU and tornado (used by Jupyter, see issue #680), as described in H2O documentation

conda install tornado==4.5.3conda upgrade jupyter_client

At this point, you can start Jupyter

jupyter notebook — certfile=jupyter-cert.pem — keyfile=jupyter-key.key

As a first test of the installation, as recommended by H2O4GPU documentation, run the following code in a Notebook cell:

import h2o4gpu
import numpy as npX = np.array([[1.,1.], [1.,4.], [1.,0.]])
model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
model.cluster_centers_

Test the performance using XGBoost.

In the Machine Learning field, one area of active research is the area of “ensemble methods”. With ensemble methods, you train a (large) set of models, to get better performances.

With Gradient Boosting, you train sequentially Decision Trees, each one trained on the residual errors of the preceding models. See, for example, https://en.wikipedia.org/wiki/Gradient_boosting

XGBoost (eXtreme Gradient Boosting) is a variant of the Gradient Boosting algorithm, born to be extremely scalable and performant.

Since these algorithms work training a large set of models (even thousands), they sometimes need a lot of computational power, especially if the training set is very large (let’s say, one million samples).

For these reasons, XGBoost and GBM are ideal candidates for GPU implementation.

For the test, I have decided to follow one of the examples provided on the H2O4GPU Github site:

https://github.com/h2oai/h2o4gpu/blob/master/examples/py/xgboost_simple_demo.ipynb

The test is using one of the Scikit-learn datasets: “Forest Covertypes”.

Each sample in the dataset corresponds to a 30x30 patch of US forest. The task is to predict the patch’s dominant type of tree. Therefore, it is a multi-class (8) classification task. The dataset contains 581012 samples and each sample has 54 characteristics.

I have reproduced here the first part of the code, taken from H2O4GPU GitHub, with very slight modifications.

import xgboost as xgb
import numpy as np
from sklearn.datasets import fetch_covtype
from sklearn.model_selection import train_test_split
import time%%time 
# Fetch dataset using sklearn cov = fetch_covtype() 
X = cov.data 
y = cov.target%%time
# Create 0.75/0.25 train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, train_size=0.75, random_state=42)%%time
# Convert input data from numpy to XGBoost format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)num_round = 10
maxdepth = 6
# base parameters
param = {'tree_method': 'gpu_hist',
         'grow_policy': 'depthwise',
         'max_depth': maxdepth,
         'random_state': 1234,
         'objective': 'multi:softmax', # Specify multiclass classification
         'num_class': 8, # Number of possible output classes
         'base_score': 0.5,
         'booster': 'gbtree',
         'colsample_bylevel': 1,
         'colsample_bytree': 1,
         'gamma': 0,
         'learning_rate': 0.1, 
         'max_delta_step': 0,
         'min_child_weight': 1,
         'missing': None,
         'n_estimators': 3,
         'scale_pos_weight': 1,
         'silent': True,
         'subsample': 1,
         'verbose': True,
         'n_jobs': -1
         }%%time
# First setup: GPU HIST DEPTHWISE
param['tree_method'] = 'gpu_hist'
param['grow_policy'] = 'depthwise'
param['max_depth'] = maxdepth
param['max_leaves'] = 0
gpu_res = {} # Store accuracy result
tmp = time.time()
# Train model
xgb.train(param, dtrain, num_round, evals=[(dtest, 'test')], evals_result=gpu_res)
print("GPU Training Time: %s seconds" % (str(time.time() - tmp)))%%time
# Second setup: GPU HIST LOSSGUIDE
param['tree_method'] = 'gpu_hist'
param['grow_policy'] = 'lossguide'
param['max_depth'] = 0
param['max_leaves'] = np.power(2,maxdepth)
gpu_res = {} # Store accuracy result
tmp = time.time()
# Train model
xgb.train(param, dtrain, num_round, evals=[(dtest, 'test')], evals_result=gpu_res)
print("GPU Training Time: %s seconds" % (str(time.time() - tmp)))%%time
# Third setup: CPU HIST DEPTHWISE
param['tree_method'] = 'hist'
param['grow_policy'] = 'depthwise'
param['max_depth'] = maxdepth
param['max_leaves'] = 0
cpu_res = {} # Store accuracy result
tmp = time.time()
# Train model
xgb.train(param, dtrain, num_round, evals=[(dtest, 'test')], evals_result=cpu_res)
print("CPU Training Time: %s seconds" % (str(time.time() - tmp)))%time
# Fourth setup: CPU HIST LOSSGUIDE
param['tree_method'] = 'hist'
param['grow_policy'] = 'lossguide'
param['max_depth'] = 0
param['max_leaves'] = np.power(2,maxdepth)
cpu_res = {} # Store accuracy result
tmp = time.time()
# Train model
xgb.train(param, dtrain, num_round, evals=[(dtest, 'test')], evals_result=cpu_res)
print("CPU Training Time: %s seconds" % (str(time.time() - tmp)))

To clarify:

in the above example, we train XGBoost using four different setups, two for GPU (before) and two for CPU (after), to compare train time
The crucial setting to use GPU is param[‘tree_method’], that needs to be set to ‘gpu_hist’
Suggestion: run each setup in a different Jupyter cell, to get the execution time

Results from my runs are in the following table:

Comparison of execution times, with and without GPU

It is easy to see that, with GPU, the training of the model is from 2 to 4 times faster.

One final question: how much is it using the GPU? Here is the answer (see the Volatile GPU utilization in the right):

Conclusion.

It is nice to discover that you can set up an environment equipped with a GPU, on Oracle Cloud, in no more than ten minutes.

With it, you can quickly start developing and testing your models, using frameworks like H2O4GPU, TensorFlow-Keras, PyTorch. You can concentrate on analyzing data and developing models, not wasting time to set up the environment.

In the article, I have shown how to set up such an environment and how to install one powerful ML framework that provides support for GPU: H2O4GPU.

Then, I have compared the performances, with and without GPU, using one of the most advanced algorithms for structured (tabular) data: XGBoost.

Additionally, I have given insights into the fact that GPU can be a powerful enabler not only in Deep Learning (for example Image Recognition) but also working with structured data.

The conclusion: if you have a lot of data, use a GPU and don’t waste your time. Enjoy.