Get started with MAX

With just a few commands, you can install MAX as a conda package and deploy a GenAI model on a local endpoint.

System requirements:

Mac

Linux

WSL

Start a GenAI endpoint

Install our magic package manager:

curl -ssL https://magic.modular.com/ | bash
curl -ssL https://magic.modular.com/ | bash

Then run the source command that's printed in your terminal.

Use magic to install our max-pipelines CLI tool:
```
magic global install max-pipelines==25.1.1
```
```
magic global install max-pipelines==25.1.1
```
This also installs MAX, Mojo, and other package dependencies.

Start a local endpoint for Llama 3:

max-pipelines serve --huggingface-repo-id=modularai/Llama-3.1-8B-Instruct-GGUF
max-pipelines serve --huggingface-repo-id=modularai/Llama-3.1-8B-Instruct-GGUF

In addition to starting a local server, this downloads the model weights and compiles the model, which might take some time.

The endpoint is ready when you see the URI printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

Now open another terminal to send a request using curl:

curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "modularai/Llama-3.1-8B-Instruct-GGUF",
    "stream": true,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
}' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'
curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "modularai/Llama-3.1-8B-Instruct-GGUF",
    "stream": true,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
}' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'

That's it. You just deployed Llama 3 on your local CPU. You can also deploy MAX to a cloud GPU.

Notice there was no step above to install MAX. That's because magic automatically installs all package dependencies when it starts the endpoint. Alternatively, you can deploy everything you need using our pre-configured MAX container.

Stay in touch

Get the latest updates

Stay up to date on MAX’s updates and key feature releases. We’re moving fast over here.

Talk to an AI Expert

Connect with our product experts to explore how we can help you deploy and serve AI models with high performance, scalability, and cost-efficiency.

Book a call

Try a tutorial

For a more detailed walkthrough of how to build and deploy with MAX, check out these tutorials.

Start a GenAI endpoint​

Stay in touch​

Get the latest updates

Talk to an AI Expert

Try a tutorial​

Start a GenAI endpoint

Stay in touch

Try a tutorial