Home » Blog

2018 Boulder GPU Hackathon group photo

Looking Back at the 2018 Boulder GPU Hackathon

Graphics Processing Units (GPUs) can enable researchers to accelerate scientific applications. However, properly leveraging this hardware and exposing parallelism in an application can be challenging. Further, changes in GPU hardware and software on 6 month to 2 year timescales makes it difficult for domain scientists to keep up with this rapidly changing environment. Hackathons provide short bursts of intense programming, where developers work alongside GPU programming experts to port and tune their software applications for the latest GPU hardware.

What happened

The 2018 Boulder GPU Hackathon was made possible through sponsorships from Nvidia, OpenACC, and Google Cloud, as well as organizational support from Fluid Numerics, CIRES/CU Boulder, and Oak Ridge Leadership Class Facility (OLCF). Developer teams came from NASA, Los Alamos National Laboratory, Pacific Northwest National Laboratory, Lawrence Berkeley National Laboratory, National Renewable Energy Laboratory, National Center for Atmospheric Research, NOAA National Severe Storms Laboratory, University of Chicago, University of Maryland, University of Colorado (Boulder),  University of Washington, SilcsBio, and General Atomics. Mentors came from Nvidia, PGI, Google Cloud, ARM, LBNL and University of Tennessee.

Daily, teams worked with their mentors to accelerate their own software applications by porting to and optimizing for GPU hardware. In the early afternoon, scrum sessions were held for all teams to report on their progress and their struggles. This catalyzed collaborative conversations that enabled teams to overcome their struggles quickly. Developers often found that another group had already encountered similar problems they were facing and were able to help them along. In an anonymous survey of the hackathon attendees, one noted that “[it was] awesome to discuss problems with other attendees and hear similar situations and get ideas I would not have thought of.”

In addition to gaining experience in GPU programming, attendees were also given the opportunity to work on Google Cloud Platform. This was a unique experience given that the Boulder GPU Hackathon was the first to make use of cloud computing resources. With sponsorship from Google Cloud and help from Wyatt Gorman (HPC Specialist at Google), a HPC cluster was set up consisting of 20 virtual machine compute nodes outfitted with Nvidia Tesla V100 GPU’s.

This platform allowed us to customize and take control of the software and hardware environment to specifically meet the attendee needs for the hackathon. For this hackathon, the software environment included the CentOS operating system outfitted with SchedMD’s Slurm job scheduler and the latest Nvidia drivers ( v 3.96 ). Additionally, we were able to provide the latest CUDA toolkit ( v 9.2 ), along with the most recently released community edition of the PGI compilers ( v 18.5 ). With this as the base install, the spack package management tool allowed for rapid installation of many typical HPC packages, like OpenMPI, MPICH, NetCDF, Lapack, and BLAS. The ability to configure this environment through a startup script allowed the setup of an HPC cluster to happen within a day. Federico Halpern, from the GA Fusion team noted that “GCP allowed for easy installation of packages. The hardware was top-notch.”


Outcome

Estimated speedups from the Boulder GPU Hackathon relative to the start of the event

By the end of the week, each team had made significant progress towards accelerating their applications. The chart to the left shows the speedup in four team’s applications relative to the run time measured at the beginning of the week. These four developer teams, AMReX, EF5, GA Fusion, and NASA-LAVA were able to achieve between 3x and 5x speedup in just five days. For the other six teams not shown in this chart, most had speedups on individual kernels within their code but not in the overall run time. This is the result of a typical struggle in GPU programming : minimizing data movement between the CPU and GPU. Nonetheless, every team left the hackathon with a path forward and a plan to continue working on their application.

At the end of the event, we conducted an anonymous survey to understand how attendees think about the hackathons. Of the 66 total attendees and mentors, we were able to obtain 29 responses.

The 2018 Boulder GPU Hackathon was the first hackathon for the majority ( 75% ) of our attendees and the majority accomplished what they thought they would for the one week event. Everyone surveyed indicated that they would recommend that other developers attend a hackathon.

 

Summarized responses from hackathons. Was this your first hackathon ?Summarized responses from hackathons. Did you accomplish what you expected to ?Summarized responses from hackathons. Would you recommend a hackathon ?

 

When asked what they liked most about the hackathon, attendees responded with comments like

“The community and the structure.”

“The support and mentoring. Mentors dig deep into the details of each specific application with the groups.”

“There were many GPU/compiler experts in the same room, so we could ask questions and solve problems quickly.”

“It was a great event. Interacting with other developers helped a lot.”

“Getting lots of talented people with different backgrounds in the same room. It was the inter-project collaboration that was the biggest benefit.”

Overall, the success of the hackathon is due to the motivated people that came together to be a part of this growing community. Everyone came with the idea that they were going to learn something new and challenging while sharing their expertise with others.


What’s Next ?

As this community continues to grow and the number of hackathons increases, we feel it’s important to share this experience with others. As a part of this, we have developed guidelines for mentors, gleaned from conversations throughout the week with mentors and attendees. In the coming months, we’ll be coming out with documentation on “Preparing for a hackathon as an attendee” and “What it takes to organize a successful hackathon”.

Organization of hackathons in new locations to share and grow this development methodology is another part of sustaining this experience. Currently, applications are open for our next hackathon in Santa Fe, NM taking place at the end of October this year.


Sign up for the gpuhackathon e-mail list to hear more about GPU computing and our upcoming hackathons !

If you are interested in bringing a hackathon to your organization, reach out and we can organize a hackathon !

The Boulder GPU Hackathon could not have happened without the support of our sponsors at Nvidia and Google Cloud.

Sponsors are critical to help cover mentor travel costs and other workshop expenses. Sponsor a hackathon today !

Mentor Guidelines

Guiding developers to success through a one-week coding sprint can be challenging. To address this issue, we have compiled some notes on what mentors can do  to help there teams be productive and successful at a week-long hackathon.  These are not strict rules to follow. Rather, they are suggestions for current and prospective mentors that get a conversation started about how we teach others at hackathons.

Attendee looking over mentor's shoulderMentor Teaching at a GPU HackathonMentor helping out at a GPU Hackathon

 

Pre-Hackathon Preparation

Identify the compute platform you will use at the hackathon

At the hackathons, there is often access to a variety of compute resources ( e.g. cloud platform, organizer/partner systems, or your team’s in-house system ). Teams that have been successful have made sure that everyone on their team, including the mentors, have access to the same compute resources. It is important that you and your team access the compute platform before the hackathon. This gives you and your team the opportunity to understand the environment setup and job scheduler on that system. Learning how to move around on the chosen compute system before the hackathon helps maximize the time your team will spend on coding.

Your team’s application should compile, run, and give correct answers

During the application process, the organizers try to find applications that are ready for a GPU Hackathon. However, it is possible that the application may not be completely ready to handle the API  you have in mind for them. It is important to make sure that your team’s code compiles with the compiler required to implement the API. For example, if you and your team are planning to use OpenACC, it’s necessary to make sure your team’s source code can be compiled with the PGI compilers.

Additionally, it is possible that your team will be running their application on a computing platform that is new to them. Make sure that the code compiles with the appropriate compilers on the planned compute platforms. The generated executable should be verified before coming to the hackathon. Help guide your team in the right direction to make sure their code is ready for a coding sprint.

Your team’s application should be self contained

Ideally, the focus of the 5 day hackathon should be limited to a few thousand lines of code. System dependencies (e.g. Cray computing environment, hard-coded paths in build system, etc) should be removed. If there are external dependencies (e.g. NetCDF, HDF5, BLAS/LAPACK) that are required, it is imperative that these dependencies, along with the required compilers, are communicated to the organizers and system administrator well before the event. At some events, this communication happens during the registration process.

Work with your team to identify external dependencies and communicate these requirements to the organizers and system administrator.

Your team’s code should be configured to run in a short amount of time

During the 5 day coding sprint, you and your team will be making modifications to the code base and running the code to ensure correctness. Successful teams have found that many small changes, accompanied with frequent code execution and output verification, is a desirable workflow. To fit this workflow and be productive at the hackathon, it is necessary that the test case used to verify/validate the code, runs in a short amount of time; this permits rapid feedback that guides changes you will be making in the code.

Your team’s code should run on one process

If your team’s application is being ported to the GPU for the first time or if you are working on kernel optimization (and the problem size is small enough to fit on the target GPU) it’s recommended that your team’s code and example test case run in serial or only on a single GPU. Working with MPI in addition to a GPU programming API can result in more hurdles to overcome that just a single GPU application.

If the end goal is to run on multiple GPU’s and there is no GPU offloading in the code, it is highly recommended that the initial focus be on a single GPU application. This is more than enough work for a single hackathon. Additionally, it’s likely that setting up GPU-to-GPU communication and tuning for the target architecture can take up a hackathon itself.

There should be a method for automatically verifying the code’s output

Although the primary goal for many teams is to have their code run faster on GPUs, the ported or tuned application should be producing the correct results. Because of this, your team should have a fairly automatic way of robustly verifying the correctness. Integrated metrics, such as RMS values, can hide changes in the produced results. Ideally, binary files that store the program’s state ( e.g. the velocity, temperature, and pressure in a fluids code ) should be compared against a reference set using “diff” or some hash function comparison.

When actively developing/debugging, it’s a good idea to generate the reference data and the “modified code” data using floating-point safe compiler flags. Only use less safe optimizations when benchmarking. In the event bit-for-bit reproducibility cannot be achieved relative to reference output, it’s good to have an application (written by your team) to compare two sets of application output.

Generate a call graph of the code before the hackathon

As someone who did not develop your team’s code, it is useful to have a callgraph on hand that highlights hot-spots in the code and depicts the relationship between subroutines/functions. The least expensive way to do this is using valgrind’s callgrind tool and kcachegrind to visualize the callgraph.

Below are the steps for doing this on a linux system. Note that valgrind and kcachegrind need to be installed.

  1. Compile the code, from scratch, with the -g compiler flag enabled.
  2. Run the code with valgrind –tool=callgrind. For example
valgrind –tool=callgrind ./my_exe


Running your application with callgrind generates a file, callgrind.XXXX.out, where the XXXX are replaced with the process id (assigned by the OS) for the application. This file can be visualized calling kcachegrind from the command line.

kcachegrind

Explore the GUI window and find the tab labeled ‘call graph’. Once you have the call graph in view, you can right click and export the call graph to a png.

Generate a profile of the code before the hackathon

To accompany the call graph, your team should generate a profile of their application before coming to the hackathon. This can be done with any profiler of your choice. You should be aware that some teams have not used a profiler before and you may need to suggest a profiler to use and provide them with some documentation to get started. In addition to speeding up code, the team’s are there to learn from the experts (you) the processes for porting to new hardware and performance tuning.

The goal is to identify which subroutines/functions you will be targeting first. Tackling the more expensive routines early on will likely give the largest speedup. This introduces high morale on your team early in the week and can be the encouragement needed for your team to want to continue this work on their own.

During the Hackathon

Initial Presentations

On the first day of the hackathon, in the morning, a member of your team will present an overview of the application and the goals for the week. The presentation should have 3-5 slides and take no more than 5 minutes to present. Your team should communicate

  1. Who they are
  2. What the application does
  3. The algorithm(s) that occupy most of the runtime currently
  4. The target for the week

The goal of the initial presentations is to present your plan clearly to other attendees, organizers, and mentors to get feedback on the approach. This helps make sure your team is on the path to success!

The presentation should be designed to communicate to other attendees at the hackathon. Typically, there is a wide variety of scientific domains in addition to computer scientists at the hackathon. Showing equations and explaining the impacts of the science does not usually help achieve the goal of these initial presentations and is therefore not recommended.

As a mentor, make sure the goals of the initial presentation are communicated to your team. Help them develop a short and succinct presentation to get feedback from other attendees.

Daily Scrum Sessions (Stand-ups)

Showing yesterday's profile at a scrum sessionShowing performance chart at a scrum sessionShowing progress at a scrum session

Every morning, each team will give a short (2-3 minute) update expressing

  1. Where they are now
  2. Where they are going
  3. Where they are struggling

The goal of the daily scrum sessions is to check in with everyone and obtain additional feedback from other groups. We have found that, despite domain science differences, teams often come across the same hurdles are willing to share their solutions with others. Typically, we have asked mentors to deliver the updates at the first scrum session.

The presentations for the scrum sessions should include profiles and code snippets. Ideally, each team would show routine speedups/slowdowns for the routines they are actively working on. At these presentations, no equations should be shown and there is no need to reiterate the scientific background of the application. The goal is to let everyone know where your team is and to seek input to help get past hurdles.

Publications and Recognition

We recognize that you may be volunteering your time to help out at a hackathon. It is encouraged that you push your team towards publishing on their work. Becoming a mentor is a good mechanism for making new connections and becoming a contributing author on scientific publications. Additionally, you can ask your teams, at the very least, to recognize your contributions and assistance in the code development in any of their future publications with the code you worked on. Asking them to include a statement like

“We’d like to thank <Mentor Name> for their contributions towards porting and accelerating <Software Name> for use on < target compute system >. This work was accomplished thanks their expertise and patience at the <Hackathon name>.”

in the acknowledgements section of a publication is usually well received and helps recognize your contributions to the community.

GPU Hackathon LED event signage teaser image

Boulder GPU Hackathon 2018

After months of planning between Nvidia, Google Cloud Platform (GCP), ARM, Fluid Numerics LLC, CU Boulder/CIRES, and Oak Ridge Leadership Class Facility (OLCF), the Boulder GPU Hackathon is coming up this week ! Compute resources for this hackathon include OLCF’s Summitdev, and a “bare-metal” style cluster on Google Cloud Platform consisting of 20 nodes equipped with Nvidia Tesla V100 GPUs.

Developer teams from NASA, Los Alamos National Laboratory, Pacific Northwest National Laboratory, National Renewable Energy Laboratory, National Center for Atmospheric Research, NOAA National Severe Storms Laboratory, University of Chicago, University of Maryland, University of Colorado (Boulder),  University of Washington, SilcsBio, and General Atomics will be teaming with mentors from Nvidia, PGI, Google Cloud Platform, ARM, national laboratories, and universities for this one week intense coding sprint. All of these teams are looking to accelerate scientific applications with GPUs and many will be porting to GPUs for the first time.

More information can be found at the Boulder GPU Hackathon event page