Post Training
A short introduction to RLHF and post-training: PDF
Inference
Lilβ Log: Inference Optimization: Distillation, Quantization, Pruning, Sparsity, Mixture-of-Experts, Architectural Optimization
Deep Dive: Optimizing LLM inference
Assisted Generation: a new direction toward low-latency text generation β Hugging Face
LLM Transformer Inference Guide β Baseten
Fast LLM Inference From Scratch β Andrew Chan
Scaling Training
Scaling GPU clusters and data-parallelism