• Home
  • About
  • Blog
  • Resources

Post Training

A short introduction to RLHF and post-training: PDF

Inference

Lil’ Log: Inference Optimization: Distillation, Quantization, Pruning, Sparsity, Mixture-of-Experts, Architectural Optimization

Deep Dive: Optimizing LLM inference

Assisted Generation: a new direction toward low-latency text generation – Hugging Face

LLM Transformer Inference Guide – Baseten

Fast LLM Inference From Scratch – Andrew Chan

Scaling Training

Scaling GPU clusters and data-parallelism

CUDA

Simon Oz: Writing CUDA kernels

Introduction to CUDA Programming