LLM inference basics
📄️ What is LLM inference?
LLM inference is the process of using a trained language model to generate responses or predictions based on prompts.
📄️ What is the difference between LLM training and inference?
LLM training builds the model; inference applies it to generate real-time outputs from new inputs.
📄️ How does LLM inference work?
Learn how prefill and decode work in LLM inference.
📄️ Where is LLM inference run?
Learn the differences between CPUs, GPUs, and TPUs
📄️ Serverless vs. Self-hosted LLM inference
Understand the differences between serverless AI APIs and self-hosted deployments.
📄️ OpenAI-compatible API
Learn the concept of OpenAI-compatible API and why you need it.