GPU Hardware and Software
What I Learned in GPU Hardware and Software (CS 8803) - A Retrospective
CUDA Fundamentals: Tiled Matrix Multiply & Bitonic Sort
Writing real GPU kernels, exploring shared memory tiling, parallel sorting algorithms, and performance optimization on an NVIDIA H100.
FlashAttention & LLM Inference on GPUs
Writing a FlashAttention CUDA kernel from scratch, tiling the attention matrix to avoid materializing N×N memory, building a KV cache for token generation, and running GPT-2 with custom kernels end-to-end.
GPU Simulation: Warp Scheduling & Compute/Tensor Cores
Building a cycle-level GPU simulator from the inside, implementing GTO and CCWS warp schedulers, modeling compute instruction latencies, and extending to tensor core simulation.
Static Analysis: Detecting Branch Divergence in GPU Code
A simple if statement can halve GPU throughput, exploring how static dataflow analysis detects branch divergence by building a def-use chain over a SASS control flow graph.