Achieving your coding and software performance goals at a GPU hackathon can be challenging. To address this issue, we have compiled some notes on what attendees can do to prepare for a hackathon. These are not strict rules to follow. Rather, they are suggestions based on what we have seen successful teams do at previous hackathon.
Identify the compute platform you will use at the hackathon
At the hackathons, there is usually access to a set of compute resources ( cloud platform, OLCF, or other organizer/partner systems, or your team’s in-house system ). Teams that have been successful have made sure that everyone on their team, including the mentors, have access to the same compute resources. It is important that you and your team access the compute platform before the hackathon. This gives you and your team the opportunity to understand the environment modules setup and job scheduler on that system.
Your team’s application should compile, run, and give correct answers
During the application process, the organizers try to find applications that are ready for a GPU Hackathon. However, it is possible that there is come additional work to make sure your software is absolutely ready for the hackathon. It is important to make sure that your team’s code compiles with the appropriate compilers. For example, if you and your team are planning to use OpenACC, it’s necessary to make sure your team’s source code can be compiled with the PGI compilers.
Additionally, it is possible that you will be running your application on a computing platform that is new to you. Make sure that the code compiles with the appropriate compilers on the planned compute platforms. The generated executable should be verified before coming to the hackathon.
Reach out to your mentors to ask questions about what API you will be working with and what platform
Your team’s application should be self contained
Ideally, the focus of the 5 day hackathon should be limited to a few thousand lines of code. System dependencies (e.g. Cray computing environment, hard-coded paths in build system, etc) should be removed. If there are external dependencies (e.g. NetCDF, HDF5, BLAS/LAPACK) that are required, it is imperative that these dependencies, along with the required compilers, are communicated to the organizers and system administrator well before the event. At some events, this communication happens during the registration process.
Work with your team and your mentors to identify external dependencies and communicate these requirements to the organizers and system administrator.
Your team’s code should be configured to run in a short amount of time
During the 5 day coding sprint, you and your team will be making modifications to the code base and running the code to ensure correctness. Successful teams have found that many small changes, accompanied with frequent code execution and output verification, is a desirable workflow. To fit this workflow and be productive at the hackathon, it is necessary that the test case used to verify/validate the code runs in a short amount of time (< 5 minutes); this permits rapid feedback that guides changes you will be making in the code.
Your team’s code should run on one process
If your team’s application is being ported to the GPU for the first time or if you are working on kernel optimization (and the problem size is small enough to fit on the target GPU) it’s recommended that your team’s code and example test case run in serial or only on a single GPU. Working with MPI in addition to a GPU programming API can result in more hurdles to overcome that just a single GPU application.
If the end goal is to run on multiple GPU’s and there is no GPU offloading in the code, it is highly recommended that the initial focus be on a single GPU application. This is more than enough work for a single hackathon. Additionally, it’s likely that setting up GPU-to-GPU communication and tuning for the target architecture can take up a hackathon itself.
There should be a method for automatically verifying the code’s output
Although the primary goal for many teams is to have their code run faster on GPUs, the ported or tuned application should be producing the correct results. Because of this, your team should have a fairly automatic way of robustly verifying the correctness. Integrated metrics, such as RMS values, are useful for quick glances but can hide changes in the produced results. Ideally, binary files that store the program’s state ( e.g. the velocity, temperature, and pressure in a fluids code ) should be compared against a reference set using “diff”, some hash function comparison, or an additional code that compares absolute and relative differences in output.
When actively developing/debugging, it’s a good idea to generate the reference data and the “modified code” data using floating-point safe compiler flags. Only use less safe optimizations when benchmarking. In the event bit-for-bit reproducibility cannot be achieved relative to reference output, it’s good to have an application (written by your team) to compare two sets of application output.
Generate a call graph of the code before the hackathon
As someone who did not develop your team’s code, it is useful for your mentors to have a callgraph on hand that highlights hot-spots in the code and depicts the relationship between subroutines/functions. The least expensive way to do this is using valgrind’s callgrind tool and kcachegrind to visualize the callgraph.
Below are the steps for doing this on a linux system. Note that valgrind and kcachegrind need to be installed.
- Compile the code, from scratch, with the -g compiler flag enabled.
- Run the code with valgrind –tool=callgrind. For example
|valgrind –tool=callgrind ./my_exe|
- Running your application with callgrind generates a file, callgrind.XXXX.out, where the XXXX are replaced with the process id (assigned by the OS) for the application. This file can be visualized calling kcachegrind from the command line.
- Explore the GUI window and find the tab labeled ‘call graph’. Once you have the call graph in view, you can right click and export the call graph to a png.
Generate a profile of the code before the hackathon
To accompany the call graph, your team should generate a profile of their application before coming to the hackathon. This can be done with any profiler of your choice. If you are unsure of what profiler to use and how to make good use of it, reach out to your mentors for some suggestions. They will likely know what is best to use based on your application and what you have available.
During the Hackathon
On the first day of the hackathon, in the morning, a member of your team will present an overview of the application and the goals for the week. The presentation should have 3-5 slides and take no more than 5 minutes to present. You should communicate
- Who you and your team are
- What your application does
- The algorithm(s)/routine(s) that occupy most of the runtime currently
- Your goal for the week
The goal of the initial presentations is to present your plan clearly to other attendees, organizers, and mentors to get feedback on the approach. This helps make sure your team is on the path to success!
The presentation should be designed to communicate to other attendees at the hackathon. Typically, there is a wide variety of scientific domains in addition to computer scientists at the hackathon. Showing equations and explaining the impacts of the science does not usually help achieve the goal of these initial presentations and is therefore not recommended; however, we understand you are passionate about your work and have particular methods for communicating. Just use them sparingly in this initial presentation.
Daily Scrum Sessions (Stand-ups)
Every morning, each team will give a short (2-3 minute) update expressing
- Where they are now
- Where they are going
- Where they are struggling
The goal of the daily scrum sessions is to check in with everyone and obtain additional feedback from other groups. We have found that, despite domain science differences, teams often come across the same hurdles are willing to share their solutions with others. Typically, we have asked mentors to deliver the updates at the first scrum session.
The presentations for the scrum sessions should include profiles and code snippets. Ideally, each team would show routine speedups/slowdowns for the routines they are actively working on. At these presentations, no equations should be shown and there is no need to reiterate the scientific background of the application. The goal is to let everyone know where your team is and to seek input to help get past hurdles.
Publications and Recognition
Recognize that your mentors are likely volunteering there time to be at the hackathon to help you out. It’s good to recognize their contributions to your software development activities through co-authorship or through acknowledgements in papers you write with the software you brought to the hackathon. For example, you could add something like
“We’d like to thank <Mentor Name> for their contributions towards porting and accelerating <Software Name> for use on < target compute system >. This work was accomplished thanks to their expertise and patience at the <Hackathon name>.”
in the acknowledgements section of a publication. This is usually well received and helps recognize their contributions to science and the rest of the GPU hackathon community.