Modular Documentation | Modular

Skip to main content

/

Modular Documentation

The Modular Platform accelerates AI inference and abstracts hardware complexity. Using our Docker container, you can deploy a GenAI model from Hugging Face with an OpenAI-compatible endpoint on a wide range of hardware.

And if you need to customize the model or tune a GPU kernel, Modular provides a depth of model extensibility and GPU programmability that you won't find anywhere else.

python
from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="EMPTY")

completion = client.chat.completions.create(
  model="modularai/Llama-3.1-8B-Instruct-GGUF",
  messages=[
    {"role": "user", "content": "Who won the world series in 2020?"}
  ],
)

print(completion.choices[0].message.content)
from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="EMPTY")

completion = client.chat.completions.create(
  model="modularai/Llama-3.1-8B-Instruct-GGUF",
  messages=[
    {"role": "user", "content": "Who won the world series in 2020?"}
  ],
)

print(completion.choices[0].message.content)

Serving

Modular’s serving library is compatible with OpenAI APIs, so you can own your endpoint with minimal client-side code changes.

Deploying

You can quickly deploy your GenAI model to the cloud using our ready-to-deploy Docker container.

Developing

The Modular platform provides full extensibility, so you can write custom ops, hardware-agnostic GPU kernels, and more.

Programming with Mojo🔥

Mojo is a Python-style programming language that allows you to write code for both CPUs and GPUs.

Learning tools

Tutorials

Step-by-step instructions to develop and deploy with the Modular platform.

Recipes

Turn-key applications that use GenAI models with the Modular platform.

GPU Puzzles

A hands-on guide to mastering GPU programming with Mojo.

500+ models supported

We're on a mission to make open source AI models as fast and easy to use as possible. Every model in our repo has been optimized using MAX Graph to ensure performance and portability across any architecture.

Latest blog posts

Go to blog

Inside Modular Hack Weekend: Top Projects and Community Highlights

Inside Modular Hack Weekend: Top Projects and Community Highlights

How is Modular Democratizing AI Compute? (Democratizing AI Compute, Part 11)

How is Modular Democratizing AI Compute? (Democratizing AI Compute, Part 11)

Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In

Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In

Modular + AMD: Unleashing AI performance on AMD GPUs

Modular + AMD: Unleashing AI performance on AMD GPUs

Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple

Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple

Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon

Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon