Modular Documentation
The Modular Platform accelerates AI inference and abstracts hardware complexity. Using our Docker container, you can deploy a GenAI model from Hugging Face with an OpenAI-compatible endpoint on a wide range of hardware.
And if you need to customize the model or tune a GPU kernel, Modular provides a depth of model extensibility and GPU programmability that you won’t find anywhere else.
Get startedfrom openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="EMPTY")
completion = client.chat.completions.create(
model="modularai/Llama-3.1-8B-Instruct-GGUF",
messages=[
{"role": "user", "content": "Who won the world series in 2020?"}
],
)
print(completion.choices[0].message.content)
from openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="EMPTY")
completion = client.chat.completions.create(
model="modularai/Llama-3.1-8B-Instruct-GGUF",
messages=[
{"role": "user", "content": "Who won the world series in 2020?"}
],
)
print(completion.choices[0].message.content)