Home » Using Google Cloud Platform at the Boulder GPU Hackathon

Getting Started with GCP

Creating an account

To gain access to the Google Cloud cluster at the Boulder GPU Hackathon, it is most secure and simple to create a Google Cloud account on https://cloud.google.com/freetrial and fill out the registration form. When you create an account, you’ll receive $300 in credit to play around with creating virtual machines. Although GCP will ask for a credit card, you will not be charged for your usage of GCP at the Boulder Hackathon; this is used for verification purposes only.

If you have any questions or concerns, feel free to reach out to Joe at joe@fluidnumerics.com

Once you’ve signed up for GCP and registered, your username will be added to the boulder-gpu-hackathon project and you will be able to ssh in to the Cloud Cluster’s login node.

Logging in

From your web browser ( Cloud Shell, Recommended )

The Cloud Shell is a unix terminal-like environment embedded in your web browser that is equipped with GCloud command line tools. A good place to get started familiarizing yourself with the cloud console is through Google’s online documentation.

To log in to the cluster, go to https://cloud.google.com and log in using your GCP credentials. In the top panel of the GCP Console, switch projects to “Boulder GPU Hackathon” (upper left portion of the GCP Console)

Google Cloud Console Upper Left Panel

Once you’ve switched to the Boulder GPU Hackathon Project, open the Cloud Shell, which can be found in the upper-right portion of the GCP Console.

Google Cloud Console Upper Right Panel

This will open a terminal window (cloud shell) in your web browser. To login to the Cloud Cluster, use the following command :

gcloud compute ‐‐project=boulder-gpu-hackathon ssh ‐‐zone=us-central1-f login1

Once logged in, to the login node, you can install your software in your home directory and submit jobs to the compute nodes in the cluster.

From your own terminal

To access the Cloud Cluster from your own terminal, you need to install the gcloud sdk command line tools and complete the initial authentication between your GCP account and your system. Complete documentation on installation and setup can be found in GCP’s documentation on Installing Google Cloud SDK.

If you are on a Debian based linux system ( e.g. Ubuntu, Mint )

sudo apt-get install gcloud

gcloud init

On Redhat and CentOS systems

sudo yum install gcloud

gcloud init

Once gcloud has been initialized, and your system has been authenticated, you can now ssh into the login node using

gcloud compute ‐‐project=boulder-gpu-hackathon ssh ‐‐zone=us-central1-f <user-name>@login1

Your user-name is Gsuite/Gmail/GCP username you submitted in the Boulder GPU Hackathon registration. For example, my GCP account is joe@fluidnumerics.com, I would use

gcloud compute ‐‐project=boulder-gpu-hackathon ssh ‐‐zone=us-central1-f joe@login1

Moving files onto the Cloud

This article from Google Cloud Platform provides good documentation on moving files from your workstation onto the cloud. To move files onto the cloud via scp, you will need to install Google Cloud SDK on your workstation.

To push a file up to your home directory on the login node

gcloud compute ‐‐project=boulder-gpu-hackathon scp <file-name> <user-name>@login1:

To pull a file from the cloud to your system

gcloud compute ‐‐project=boulder-gpu-hackathon scp <user-name>@login1:path/to/file path/to/destination/

Setting up your environment

Packages on the GCP cluster are managed with Spack and modules. The base installation on the GCP cluster consists of compilers and associated MPI builds.

The table below lists the base installation packages available on the GCP Cluster in addition to the commands needed to bring these packages into your environment.

Compiler / MPI OpenMPI 2.1.2 OpenMPI 3.0.1 MPICH 3.2.1
GCC 7.3.0

spack load gcc@7.3.0

N/A spack load openmpi@3.0.1 %gcc7.3.0 spack load mpich@3.2.1 %gcc@7.3.0
PGI 17.10

module load pgi/17.10

module load openmpi/2.1.2/pgi/17.10 N/A spack load mpich@3.2.1 %pgi@17.10
PGI 18.4

module load pgi/18.4

module load openmpi/2.1.2/pgi/18.4 N/A spack load mpich@3.2.1 %pgi@18.4

Environment Modules and Spack allow you to load and unload libraries and binaries to your search path. This is useful given that we have many users all with different compiler and library needs.

To see a list of available packages on our system, use

spack find

To load packages, use

spack load <package-name>

This command loads the appropriate paths to your PATH and LD_LIBRARY_PATH

To clear your PATH and LD_LIBRARY_PATH

spack unload

CUDA Toolkits

When using the PGI compilers, CUDA toolkits installed with PGI are brought into your environment. Doing an ` echo $CUDAPATH ` after loading either of the PGI compilers will reveal where the CUDA toolkit can be found.

Installs of cuda-toolkit versions 8.0 and 9.0 are provided for using the CUDA toolkit without PGI compilers.

To load version 8.0

module load cuda/8.0

To load version 9.0

module load cuda/9.0

Submitting Jobs (Slurm)

The GCP cluster we’ve set up uses the Slurm job scheduler to manage workload requests from all of our attendees. To run applications on this system, you must write a job submission script and use the sbatch job scheduling command.

Sample Job Submission Script

#!/bin/bash

#

# Gain exclusive access to a whole node

#SBATCH ‐‐exclusive

# How many nodes are needed

#SBATCH ‐‐nodes=2

#

#SBATCH ‐‐ntasks=2

#

# How many MPI tasks per node

#SBATCH ‐‐ntasks-per-node=1    

#

# How many physical CPU’s per task

#SBATCH ‐‐cpus-per-task=1

#  

# How long the job is anticipated to run

#SBATCH ‐‐time=00:20:00

#

# The name of the job

#SBATCH ‐‐job-name=sample_job_name  

#

pwd; hostname; date

# Load PGI compilers with MPI

module load pgi/18.4 openmpi/2.1.2/pgi/18.4

mpirun -np  2 ./my_exe

date

Additional useful slurm submission scripts can be found at the University of Florida Research Computing’s website.

To submit a job with a slurm submission script

sbatch <slurm-file>

Use squeue to check the status of jobs running on the cluster.

System Specifications

The login nodes, controller nodes, and compute nodes use the n1-standard machine type on GCP.

For the n1 series of machine types, a virtual CPU is implemented as a single hardware hyper-thread on a 2.6 GHz Intel Xeon E5 (Sandy Bridge), 2.5 GHz Intel Xeon E5 v2 (Ivy Bridge), 2.3 GHz Intel Xeon E5 v3 (Haswell), 2.2 GHz Intel Xeon E5 v4 (Broadwell), or 2.0 GHz Intel Skylake (Skylake). For more information, on the n1-standard machines, read more about GCP Machine Types

Login Node

The login node uses the n1-standard-64 setup, giving 64 cores and 240 GB RAM. There are no GPU’s attached to the login node. The login node is meant for users to edit source code, compile executables, and submit jobs to run on the compute nodes.

Users should not run intense compute jobs on the login nodes

Compute Nodes

The compute nodes use the n1-highmem-8 setup, giving 8 cores and 52 GB RAM with the addition of one V100 per node. At this time, there are limitations on the number of virtual CPUs allowable for use with each V100; this article provides more details on the allowable GPU-CPU configurations.

Controller Node

The login node uses the n1-standard-64 setup, giving 64 cores and 240 GB RAM. There are no GPU’s attached to the controller node. The controller node hosts the /apps/ and /home/ directory space on the compute cluster; these paths are mounted on the login node and all of the compute nodes

Network

The login, controller, and compute nodes are networked together using Google’s Virtual Private Cloud (VPC) infrastructure. The physical hardware underlying the network system consists of proprietary high-speed “ethernet-like” hardware achieving a max bandwidth of 16 Gb/s.