Function calling and tool use
Function calling enables AI models to dynamically interact with external systems, retrieve up-to-date data, and execute tasks. This capability is a foundational building block for agentic GenAI applications, where models call different functions to achieve specific objectives.
When to use function calling
You may want to define functions for the following purposes:
- To fetch data: Access APIs, knowledge bases, or external services to retrieve up-to-date information and augment model responses
- To perform actions: Execute predefined tasks like modifying application states, invoking workflows, or integrating with custom business logic
Based on the system prompt and messages, the model may decide to call these functions instead of or in addition to generating text. Developers then handle the function calls, execute them, and return the results to the model, which integrates the function call results into its final response.
How function calling works
MAX supports the OpenAI function calling specification to call developer-defined functions as tools that a model can use to augment prompts, in order to have more control over model behavior and directly trigger actions based on user input.
The following example defines a function, registers that function as a tool, and sends a request to the chat completion client.
from openai import OpenAI
import json
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
# Define a function that the model can call
def get_weather(location: str):
return f"Getting the weather for {location} ..."
# Register your function as an available tool
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., 'Los Angeles, CA'",
}
},
"required": ["location"],
},
},
}
]
# Generate a response with the chat completion client with access to tools
response = client.chat.completions.create(
model="modularai/Llama-3.1-8B-Instruct-GGUF",
messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
tools=tools,
stream=False,
)
# Print the model's selected function call
print(completion.choices[0].message.tool_calls)
from openai import OpenAI
import json
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
# Define a function that the model can call
def get_weather(location: str):
return f"Getting the weather for {location} ..."
# Register your function as an available tool
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., 'Los Angeles, CA'",
}
},
"required": ["location"],
},
},
}
]
# Generate a response with the chat completion client with access to tools
response = client.chat.completions.create(
model="modularai/Llama-3.1-8B-Instruct-GGUF",
messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
tools=tools,
stream=False,
)
# Print the model's selected function call
print(completion.choices[0].message.tool_calls)
At this stage of the function calling workflow, the model responds with the selected tool to use along with detected function inputs:
[{
"id": "call_12345xyz",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Paris, France\"}"
}
}]
[{
"id": "call_12345xyz",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Paris, France\"}"
}
}]
From here, you must execute the function call and supply the model with the results in order to augment the model response.
The OpenAI function calling spec is compatible with multiple agent frameworks, such as AutoGen, CrewAI, and more.
Supported models
The max
CLI supports several LLMs optimized for function calling:
modularai/Llama-3.1-8B-Instruct-GGUF
- Meta's Llama 3.1 models & evals collection
- Meta's Llama 3.2 language models & evals collection
Quickstart
Use MAX to serve a model that is compatible with function calling and test it out locally.
Here's how you can quickly serve a model that's compatible with function calling and test it locally:
-
Create a virtual environment and install the
max
CLI:- pip
- uv
- conda
- pixi
- Create a project folder:
mkdir function-calling && cd function-calling
mkdir function-calling && cd function-calling
- Create and activate a virtual environment:
python3 -m venv .venv/function-calling \
&& source .venv/function-calling/bin/activatepython3 -m venv .venv/function-calling \
&& source .venv/function-calling/bin/activate - Install the
modular
Python package:- Nightly
- Stable
pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--index-url https://dl.modular.com/public/nightly/python/simple/pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--index-url https://dl.modular.com/public/nightly/python/simple/pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://modular.gateway.scarf.sh/simple/pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://modular.gateway.scarf.sh/simple/
- If you don't have it, install
uv
:curl -LsSf https://astral.sh/uv/install.sh | sh
curl -LsSf https://astral.sh/uv/install.sh | sh
Then restart your terminal to make
uv
accessible. - Create a project:
uv init function-calling && cd function-calling
uv init function-calling && cd function-calling
- Create and start a virtual environment:
uv venv && source .venv/bin/activate
uv venv && source .venv/bin/activate
- Install the
modular
Python package:- Nightly
- Stable
uv pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--index-url https://dl.modular.com/public/nightly/python/simple/ \
--index-strategy unsafe-best-matchuv pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--index-url https://dl.modular.com/public/nightly/python/simple/ \
--index-strategy unsafe-best-matchuv pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://modular.gateway.scarf.sh/simple/ \
--index-strategy unsafe-best-matchuv pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://modular.gateway.scarf.sh/simple/ \
--index-strategy unsafe-best-match
- If you don't have it, install conda. A common choice is with
brew
:brew install miniconda
brew install miniconda
- Initialize
conda
for shell interaction:conda init
conda init
If you're on a Mac, instead use:
conda init zsh
conda init zsh
Then restart your terminal for the changes to take effect.
- Create a project:
conda create -n function-calling
conda create -n function-calling
- Start the virtual environment:
conda activate function-calling
conda activate function-calling
- Install the
modular
conda package:- Nightly
- Stable
conda install -c conda-forge -c https://conda.modular.com/max-nightly/ modular
conda install -c conda-forge -c https://conda.modular.com/max-nightly/ modular
conda install -c conda-forge -c https://conda.modular.com/max/ modular
conda install -c conda-forge -c https://conda.modular.com/max/ modular
- If you don't have it, install
pixi
:curl -fsSL https://pixi.sh/install.sh | sh
curl -fsSL https://pixi.sh/install.sh | sh
Then restart your terminal for the changes to take effect.
- Create a project:
pixi init function-calling \
-c https://conda.modular.com/max-nightly/ -c conda-forge \
&& cd function-callingpixi init function-calling \
-c https://conda.modular.com/max-nightly/ -c conda-forge \
&& cd function-calling - Install the
modular
conda package:- Nightly
- Stable
pixi add modular
pixi add modular
pixi add "modular=25.4"
pixi add "modular=25.4"
- Start the virtual environment:
pixi shell
pixi shell
-
Start an endpoint with a model that supports function calling:
max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF
max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF
-
Wait until you see this message:
Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
Then open a new window and send a request to the endpoint specifying the available tools:
curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "modularai/Llama-3.1-8B-Instruct-GGUF",
"stream": false,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the weather like in Boston today?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. Los Angeles, CA"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "modularai/Llama-3.1-8B-Instruct-GGUF",
"stream": false,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the weather like in Boston today?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. Los Angeles, CA"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
Within the generated response, you should see that the get_weather
function
was chosen to call as a tool and the inputs for the function are taken from the
original prompt.
"tool_calls": [
{
"id": "call_ac73df14fe184349",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"Boston, MA\"}"
}
}
]
"tool_calls": [
{
"id": "call_ac73df14fe184349",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"Boston, MA\"}"
}
}
]
Next steps
Now that you know the basics of function calling, you can get started with MAX on GPUs.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!