Post Training
Inference
Lil’ Log: Inference Optimization: Distillation, Quantization, Pruning, Sparsity, Mixture-of-Experts, Architectural Optimization
Deep Dive: Optimizing LLM inference
Assisted Generation: a new direction toward low-latency text generation – Hugging Face
LLM Transformer Inference Guide – Baseten
Scaling Training
Scaling GPU clusters and data-parallelism