Final Project

Due Dates

For the final project, you’ll be completing the following deliverables:

The document below provides more details on what you’ll be turning in.

Overview

Now that we’ve wrapped up lab assignments for the semester, we’ve set aside these last several weeks of 6.S894 for you to work on an open-ended project on a topic of your choice. This final project is meant to give you an opportunity to exercise what you’ve learned, to learn more about topics we didn’t have a chance to cover deeply in the labs, and to get a better sense of what software engineering for GPUs and other accelerators looks like in the real world. We hope you end up having fun with it!

Here’s an overview of final project logistics:

Part 1: Topic Selection

You’re welcome to propose any final project topic you like related to accelerated computing, and the course staff will work with you during the approval process to make sure the scope of your project is appropriate.

Final project topics should be significantly more challenging than a lab assignment, and we’ll expect larger teams to take on somewhat more ambitious projects than smaller teams.

Example

As an example of how you might choose your final project topic, you could start with an idea like:

Core idea (good start; not sufficient on its own):
“We could implement the forward pass of FlashAttention.”

You could then come up with possible extensions to this core idea which would make it more interesting:

Possible extensions:

  1. “We could implement both the forward and backward passes of FlashAttention.”

  2. “We could aim to achieve 90% of the performance of the best FlashAttention implementation we can find online.”

  3. “We could aim to efficiently support irregular and dynamic attention masks.”

  4. “We could implement a version of FlashAttention using a technique like PagedAttention to more efficiently handle batches with irregular sequence lengths.”

  5. “We could add support for low-precision (e.g. 4-bit-quantized) inputs with special techniques to preserve accuracy, as in this paper.”

  6. “We could integrate our FlashAttention implementation with PyTorch and measure how it affects the end-to-end performance of a transformer implementation.”

  7. “We could explore opportunities for fusing our FlashAttention implementation with other nearby operations in a transformer layer, such as a rotary position embedding (RoPE) operator.”

The appropriate scope for your project depends on the size of your team:

(Note: If the above FlashAttention ideas sound interesting, you’re welcome to actually pursue this for your final project if you want!)

Proposals

Final project proposals should be submitted on Gradescope (link) in PDF format, and should contain the following parts:

  1. A list of team members.

  2. A brief description of your idea, in 1-2 paragraphs.

  3. An explanation of the resources you plan to use for your project.

Example

The following shows an example of a complete, successful final project proposal:

Final Project Proposal (Example)

Team Members: Alice, Bob, Carol

Description: We plan to implement a FlashAttention forward and backward pass at bfloat16 precision, targeting RTX A4000 GPUs. We will aim for both our forward and backward passes to achieve 90% of the throughput of the best FlashAttention implementation we can find online, evaluated at reasonable problem sizes. We will integrate our FlashAttention implementation with PyTorch and measure its affect on the end-to-end latency of a single training step for a transformer model.

Resources: We will develop and benchmark our kernels using the RTX A4000 GPU provided by the course. Additionally, Carol has access to an NVIDIA A6000 GPU through her lab, which we may use to run NSight Compute when debugging the performance of our kernels.

Approval Process

After you submit your final project proposal, the course staff will try to get back to you within 1-2 days to either…

  1. Immediately approve your proposed topic, or…

  2. Work with you to refine the scope of your proposed topic.

If the course staff doesn’t immediately approve your proposal, we may try to schedule a meeting with members of your team to discuss ways your proposal could be extended or pared down. After the course staff has worked with you to develop a revised proposal, you can consider your proposal approved and can start working on your project.

All correctly-formatted project proposals submitted before the November 15 deadline will receive full credit for the proposal component of the final project, regardless of whether or not they are immediately approved.

Topic Suggestions

Although we encourage you to come up with your own ideas for final projects, we also have a list of topic areas you might find it helpful to consider when writing your proposals.

The topics we suggest can be roughly broken down into two categories:

  1. Performance engineering projects, where the goal is to develop an implementation of some workload which runs as fast as possible.

  2. Investigation / reverse-engineering projects, where the goal is to deeply understand the performance characteristics and microarchitectural details of the hardware. Because these details are often undocumented, doing this will likely require running a lot of carefully-controlled experiments to try to determine how the hardware works empirically.

This “performance engineering” / “investigation” distinction isn’t perfectly sharp, and it’s fine to do a project which straddles both categories.

With those categories in mind, here are some of the topic areas you may want to consider. These don’t necessarily all constitute complete project ideas; you may need to combine several ideas from this list, or elaborate on an idea with your own extensions, to arrive at a complete project proposal:

Part 2: Project Implementation

After your proposal is approved, you’ll have the last approximately three weeks of class available to work on final projects. During this time, the class will continue to meet in-person during live lab to give you time to work as a team and to discuss your projects with course staff.

We’ll ask each student to submit three low-stakes checkpoint assignments on Gradescope to let us know how they’re doing with the final project. These checkpoints will be graded roughly the same way as lab checkpoints, and are mostly a way for the course staff to identify ways in which teams are stuck and to help them get un-stuck.

Infrastructure

We have released a new, more flexible version of Telerun for teams to develop their final projects. More information about this new course infrastructure is coming soon. Thanks for your patience!

Part 3: Final Presentation and Report

During live lab time on Tuesday, December 10, every team will present a brief overview of what they achieved in their final project. Due to time constraints, presentations will necessarily be short, with an anticipated time budget of exactly 5 minutes per presentation.

Additionally, on the same day as final presentations, your team will submit a final report including:

  1. All code to reproduce the experiments in your project.

  2. A write-up in PDF format describing:

    • The objectives you set for your final project.

      • In the case of a performance engineering project, any analysis you performed, microbenchmarks you ran, or baseline implementations you used to determine the performance targets you should try to hit. Since there won’t be a staff baseline of comparison, it will now be your job to provide evidence to convince us (and yourselves!) why the performance you reached is “good.”
    • The design of any code you wrote, including a discussion of your design process and any alternative designs you explored.

    • The results of any experiments you ran.

    • A discussion of your results, including any limitations of your implementation or experiments, and directions for future work.

    • A related work section covering existing publicly-available software, papers, blog posts, etc. relevant to your project.

We intend to share each final report with the whole class, so that any student who finds another team’s presentation interesting can learn about that team’s work in greater depth.