From Silicon to Softmax

A structured curriculum from bare-metal systems programming to distributed GPU clusters — the path from writing apps to making $100B data centers work.

8 modules · 5 lessons

The Low-Level Foundation GPU & Parallelism Distributed Systems ML Internals & Optimization Cluster Orchestration ML Platform Engineering Inference from Scratch Agents from Scratch

Module 1: The Low-Level Foundation

Systems programming in Rust, CPU architecture, SIMD, memory hierarchy, and Linux performance profiling.

From Silicon to Softmax: Course Overview

A structured learning path from bare-metal systems programming to distributed GPU clusters — the full stack of AI infrastructure, from the silicon up.

6 min read

Module 2: GPU & Parallelism

CUDA programming, Triton kernels, parallel algorithms, kernel fusion, and FlashAttention.

Coming soon

Module 3: Distributed Systems

RDMA, InfiniBand, NCCL, distributed training with DDP/FSDP, and the 3D parallelism grid.

Coming soon

Module 4: ML Internals & Optimization

Quantization, inference optimization, and the Rust GPU frontier.

Coming soon

Module 5: Cluster Orchestration

Kubernetes for ML, Slurm, Volcano/Kueue, MPI Operator, KubeRay, topology-aware scheduling, multi-tenancy — running training jobs on shared GPU clusters.

Cluster Orchestration: Module Overview

Running jobs on iron. Kubernetes for ML, Slurm, Volcano/Kueue, MPI Operator, KubeRay, topology-aware scheduling, multi-tenancy, spot/preemptible orchestration — the layer between 'I can write FSDP' and 'my 256-GPU training run finished without paging anyone at 3am'.

3 min read

Module 6: ML Platform Engineering

Experiment tracking, model registry, training observability, workflow orchestration, CI/CD for models, cost attribution — the infrastructure that turns one good training run into a reliable model factory.

ML Platform Engineering: Module Overview

Make ML reproducible, observable, deployable. Experiment tracking, model registry, training observability, workflow orchestration, CI/CD for models, cost attribution — the infrastructure that turns one good training run into a reliable model factory.

3 min read

Module 7: Inference from Scratch

A tutorial series on modern LLM inference — attention variants (GQA, MLA), positional encodings, KV cache, Mixture of Experts, multi-token prediction, and serving internals.

Inference from Scratch: Series Overview

A tutorial series on the full design space of modern LLM inference — every meaningful variant of attention, positional encoding, routing, sampling, quantization, and serving, built from scratch on one GPU, then taken distributed.

6 min read

Module 8: Agents from Scratch

A tutorial series on building production-grade LLM agents — the agent loop, tool use, memory, RAG, planning, context engineering, multi-agent systems, and distributed agent infrastructure.

Agents from Scratch: Series Overview

A tutorial series on the full design space of LLM agents — the agent loop, tool use, memory, RAG, planning, context engineering, multi-agent systems, and the distributed infrastructure that actually ships them. Built from scratch on one machine, then taken distributed.

7 min read