Deep Dive: Optimizing LLM inference
Assisted Generation: a new direction toward low-latency text generation – Hugging Face
LLM Transformer Inference Guide – Baseten