Telerun Flow for Final Projects

Execution Flow

As stated in the Final Project instructions, we have setup a more flexible access to GPUs over Telerun. The update mainly includes the possibility to ship more complex files as an input rather just simple single-file .cu compilation units.

In particular, the new flow allows shipping arbitrary tarballs (*.tar) with pre-compiled and pre-built binaries, input data, scripts, etc. for direct execution by the Telerun GPUs. The minimal tarball must contain the entry point script run.sh (with execution permissions) which will be invoked on the GPU server. Inside the run.sh, you can put all actions that should be done on the server to execute your project and get the output. For example, you can run your pre-compiled CUDA binary (which should also be placed in the same tarball), execute a python script to generate test data, export all necessary environmental variables, upload large output files to an external server, and anything else you might need.

Limitations

The current execution flow has the following limitations:

supported GPU types: NVIDIA RTX A4000
maximum size of the acceptable tarballs: 64 MiB
- if your tarball is large due to, for example, large test inputs/datasets, consider shipping them separately through an external storage or generating directly on the server as a part of your execution;
maximum size of the outputs: 64 MiB
- similarly, consider shipping large outputs (but better avoid them) through an external storage;
execution timeout (applies to the entire execution of run.sh): 60 seconds
default runtime environment is based on Ubuntu-22.04 (amd64) and includes:
- CUDA-11.8.0 (form image nvidia/cuda:11.8.0-base-ubuntu22.04)
- python-3.10
- torch-2.1.0

NOTE: If your setup (likely) requires more packages/libraries to be installed on the execution server, feel free to contact the course stuff, and we will include them in the default image.

Development Flow

The Telerun execution flow assumes the submission tarball contains the pre-built project with compiled binaries ready to be executed by the GPU server. It’s entirely up-to you how you build your project and pack the tarballs, however we strongly encourage you to follow our development flow explained here to make your builds simple, portable, and reproducible. The flow is based on Docker Development Containers.

Why? Because sometimes, development environments get really complex: A lot of dependencies (that constantly get updated), packages with different versions backed by different package managers, potential conflicts with other packages within your environment that often break things, so it takes hours to fix. And in the end of the day, you want to make your development environment portable and reproducible, so other people can easily build what you designed on their own machines independently on any underlying setup.

To get started, we prepared a very simple version of this flow in a couple of demos available here. The demos are in different branches, namely:

main – building and running a simple CUDA vector addition kernel;
example_lab_5 – example flow to build lab5 matmul exercise and generate input test cases on the execution server;
example_lab_5_cutlass – lab5 matmul with CUTLASS.

There are no really limits on what you can include in the development environment and what building system you might use for your project. These simple examples are built using hang-written compile commands, but of course you can do anything (e.g. Meson, CMake, Bazel, Ninja, etc.). The only caveat here is that Docker is still just a container, and it will run on whatever system architecture your host is based on. Given that our execution servers are all amd64, if you develop on, say, Apple ARM silicon, you might want to consider cross compilation. Or (better) virtualization with emulation such as OrbStack which can run Docker containers as easy as they are.

Details and Examples

Here, we will go through a simple example from the example_lab_5_cutlass branch which allows you to locally build our lab5 matmul code with integrated CUTLASS and input test cases.

Step 1: Creating Your Development Environment

The entire environment is specified in devctr/Dockerfile. In this example, it contains cuda:11.8.0-devel-ubuntu22.04 based image which gives us the CUDA-11.8 sdk. We also install git, python3, and CUTLASS. The latter is simply done via git clone. If some dependencies require complex installation, it should also be provided in this Dockerfile.

The development container has its own file system, and by default, the Dockerfile’s “home directory” is the root (/) of the container.

After you specified your environment, you can build the development container image using our helper script devtool: ./devtool build_devctr. The image will be stored locally on your host, and you can optionally push it to Docker Hub if you want to “commit” it or share with anyone working on the project with you.

Step 2: Building in Your Development Environment

To develop inside the container, all we need is to run the image and invoke the project building instructions. Here, it’s important to understand how Docker is accessing the host file system where the sources of the project are placed. A standard way of doing is using Docker Volumes. In a nutshell, Docker Volumes allow us to mount any host directory into some path inside the running container for read/write access.

For simplicity, we have provided you with the ready-to-use configs in the same devtool script. It defines two folders in the same directory: src and build:

src is mounted by the path /final_project inside the development container;
build – by the path /build;
additionally, devtool also makes a tarball build.tar out of the content of build which is ready to be shipped to Telerun for the execution.

This means that anything you put into the src folder will be available inside the container in the /final_project folder, and everything your build script writes into /build will end-up in the host’s build folder and the shippable build.tar.

Now, what the development container does is it executes the src/build.sh script inside its environment. You can put arbitrary complex build instructions inside this script. In the provided example, we executes nvcc -O3 ... command to build the matmul_lab5.cu CUDA file and add the CUTLASS headers into it. In your project, you can have your own build system set-up (e.g. with Ninja) and just call it from there (e.g. cd build; ninja).

If you need to add some other files into your output (e.g. gen_test_data.py in our example which will generate test data on the GPU server), just copy them from /final_project into /build, and they will end-up in the same shippable tarball as your compiled project.

To build, just execute ./devtool build_project

Step 3: Running!

As stated earlier, Telerun unpacks the content of the shipped tarball into its internal workdir and invokes the run.sh script. The run.sh is located in src/run.sh and it gets copied into the tarball during the building step.

Note that run.sh operates on the GPU server, inside the Telerun’s workdir for your submission. You probably don’t need to check where exactly on the execution server the script is located, but if needed, you can always call pwd from run.sh to see it. You can also call ls and then submit the tarball to see the content of the Telerun workdir for your submission and check that it contains all your binaries.

The tarball is submitted via Telerun with python3 telerun.py submit build.tar.

Upon completion, Telerun with return the output in the same way as for the labs. You can inspect the lab source code in more details to see where the output should be placed, so it can be delivered back to you.

Summary

In short, this is the sequence of actions to be done to build, run, and get results for you project:

(one time action) prepare your development environment by installing the necessary packages in devctr/Dockerfile
(one time action) build your development environment: ./devtool build_devctr
(one time action) place your project inside src/ folder
(one time action) write the building instructions in src/build.sh
build it: ./devtool build_project
submit it: python3 telerun.py submit build.tar

Only repeat 5. and 6. when working on your project.