Post Training
A short introduction to RLHF and post-training: PDF
Inference
Lil’ Log: Inference Optimization: Distillation, Quantization, Pruning, Sparsity, Mixture-of-Experts, Architectural Optimization
Deep Dive: Optimizing LLM inference
Assisted Generation: a new direction toward low-latency text generation – Hugging Face
LLM Transformer Inference Guide – Baseten
Fast LLM Inference From Scratch – Andrew Chan
Scaling Training
Scaling GPU clusters and data-parallelism
How to Scale Your Model – A Systems View of LLMs on TPUs