Getting started

Before you can run an LLM in production, you first need to make a few key decisions. These early choices will shape your infrastructure needs, costs, and how well the model performs for your use case.

📄️ Choosing the right model

Select the right models for your use case.

📄️ Choosing the right GPU

Select the right NVIDIA or AMD GPUs (e.g., L4, A100, H100, B200, MI250X, MI300X, MI350X) for LLM inference.

📄️ Calculating GPU memory for serving LLMs

Learn how to calculate GPU memory for serving LLMs.

📄️ LLM fine-tuning

Understand LLM fine-tuning and different fine-tuning frameworks.

📄️ LLM quantization

Understand LLM quantization and different quantization formats and methods.

📄️ Choosing the right inference framework

Select the right inference frameworks for your use case.

🗃️ Tool integration

3 items

Stay updated with the handbook

Get the latest insights and updates on LLM inference and optimization techniques.

Monthly insights
Latest techniques
Handbook updates