September 12, 2024 • Written By Sherlock Xu
Today, we're witnessing a shift toward LLMs becoming integral parts of compound AI systems. In this new paradigm, LLMs are no longer standalone models. Instead, they are embedded in larger, more complex architectures that require them to connect with external APIs, databases, and custom functions. One of the most significant advancements in this area is function calling.
In this blog post, we’ll explore function calling, with a special focus on open-source LLMs. This post is also the first in our blog series on function calling with open-source models, where we will cover:
First, it's crucial to understand that function calling doesn't fundamentally change how LLMs operate. At their core, these models still function as transformers that predict the next word in a sequence. What changes is that this approach adds a structured mechanism for interacting with external tools.
Function calling is essentially a structured way of "prompting" an LLM with detailed function definitions and explanations, which the LLM then uses to generate appropriate outputs. These outputs can be used to interact with external "functions", and often times the results can be fed back into the LLM for multi-turn conversations.
Here's a simple example of how a function definition might look:
{ "name": "get_delivery_date", "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'", "parameters": { "type": "object", "properties": { "order_id": { "type": "string", "description": "The customer's order ID.", }, }, "required": ["order_id"], "additionalProperties": false, } }
Here is a high level workflow diagram of function calling:
Function calling expands the capabilities of LLMs, giving them access to up-to-date or proprietary knowledge and allowing them to interact with other systems. The key benefits include:
Integrating with external data sources. LLMs can query databases, access APIs, or retrieve information from proprietary knowledge bases. For example, when users ask for the latest exchange rate between USD and CAD, the LLM can fetch real-time data from an API.
Taking actions via API calls or user-defined code. LLMs can trigger actions in other systems and run specialized tasks or models. For example:
The primary use cases of function calling fall into two major categories.
Function calling allows you to create intelligent AI agents capable of interacting with external systems and performing complex tasks:
Function calling requires the LLM used to execute the function call in an exact format defined by the function’s signature and description. For example, JSON is often used as the representation for the function call, making function calling an effective tool for extracting structured data from unstructured data.
For example, an LLM with function-calling capabilities can process large volumes of medical records, extracting key information such as patient names, diagnoses, prescribed medications, and treatment plans. By converting this unstructured data into structured JSON objects, healthcare providers can easily parse the information for further analysis, reporting, or integration into EHR systems.
Proprietary LLMs like GPT-4 come with built-in support for function calling, making integration more straightforward. By contrast, when using open-source models like Mistral or Llama 3.1, the process can be more complex and require additional customization.
So, why choose open-source models for function calling?
While open-source LLMs offer advantages in terms of flexibility and customization, you may find it difficult to implement function calling with these models given the following challenges:
One of the primary challenges is getting open-source LLMs to produce outputs in certain formats for function calling, such as:
Structured outputs. Function calling often relies on well-structured data formats like JSON or YAML that match the expected parameters of the function being called. This is vital to making function calling reliable, especially when the LLM is integrated into automated workflows. However, open-source LLMs can sometimes deviate from the instructions and produce outputs that are not properly formatted or contain unnecessary information.
Several tools and techniques have been developed to address this challenge, such as Outlines, Instructor and Jsonformer. Here is an example of using Pydantic to define JSON schema and use it in the OpenAI client.
from pydantic import BaseModel from openai import OpenAI base_url = "https://bentovllm-llama-3-1-70-b-instruct-awq-service-0nek.realchar-guc1.bentoml.ai/v1" model = "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4" client = OpenAI(base_url=base_url, api_key="n/a") class CalendarEvent(BaseModel): name: str date: str participants: list[str] if __name__ == "__main__": completion = client.beta.chat.completions.parse( model=model, messages=[ {"role": "system", "content": "Extract the event information."}, {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."}, ], response_format=CalendarEvent, ) event = completion.choices[0].message.parsed print(type(event), event, event.model_dump())
Specialized formats. Certain tasks require LLMs to generate more specialized outputs, such as SQL queries or DSLs. This is often implemented using grammar mode or regex mode, which restrict the LLM's output to match a predefined format. Here’s an example of how you can use Outlines to constrain the output of an LLM using regex to match a specific pattern, in this case, an IP address format.
from outlines import models, generate model = models.transformers("microsoft/Phi-3-mini-128k-instruct") regex_str = r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)" generator = generate.regex(model, regex_str) result = generator("What is the IP address of localhost?\nIP: ") print(result) # 127.0.0.100
Different open-source LLMs have varied levels of capabilities and limitations when it comes to supporting multiple types of function calls, such as single, parallel, and nested (sequential) function calls.
For a comprehensive comparison of LLM function calling capabilities, check out the Berkeley Function Calling Leaderboard.
An LLM application with function calling typically includes multiple components, such as model inference, custom business logic, user-defined functions, and external API integrations. Orchestrating these components in a production environment can be complex and challenging.
At BentoML, we are working to provide the complete platform for enterprise AI teams to build and scale compound AI systems. It seamlessly brings cutting-edge AI infrastructure into your cloud environment, enabling AI teams to run inference with unparalleled efficiency, rapidly iterate on system design, and effortlessly scale in production with full observability.
Function calling transforms LLMs into dynamic tools capable of interacting with real-time data and executing complex workflows. While proprietary models like GPT-4 offer built-in function calling, open-source LLMs provide enterprises with greater flexibility, privacy, and customization options. Stay tuned for our next post, where we'll dive deep into code examples of implementing function calling with open-source LLMs using BentoML!
Check out the following resources to learn more: