Model interaction

Model interaction is the process of interacting with the model to get the desired output. It is the process of sending a prompt to the model and getting a response.

📄️ OpenAI-compatible API

An OpenAI-compatible API implements the same request and response formats as OpenAI's official API, allowing developers to switch between different models without changing existing code.

📄️ Anthropic-compatible API

An Anthropic-compatible API mirrors Anthropic's Messages API so Claude-based clients, SDKs, and agent tools can use another model or provider with minimal code changes.

📄️ Function calling

Learn what function calling is and its use case.

📄️ Structured outputs

Structured outputs are model responses in defined formats like JSON or XML, making AI-generated data predictable, machine-readable, and easy to integrate into applications and workflows.

📄️ Model Context Protocol

Learn what Model Context Protocol (MCP) is and its use case.

📄️ Prompt engineering

Understand prompt engineering for LLM inference. Learn system & user prompts, zero-shot & few-shot prompting, KV cache impact, token costs, and production best practices.

📄️ LLM inference parameters

LLM inference parameters are request-time settings that control randomness, output length, repetition, stopping behavior, reproducibility, and structured generation.

Stay updated with the handbook

Get the latest insights and updates on LLM inference and optimization techniques.

Monthly insights
Latest techniques
Handbook updates