Get started with MAX
With just a few commands, you can install MAX as a conda package and deploy a GenAI model on a local endpoint.
System requirements:
Mac
Linux
WSL
GPU
Start a GenAI endpoint
-
Install our
magic
package manager:curl -ssL https://magic.modular.com/ | bash
curl -ssL https://magic.modular.com/ | bash
Then run the
source
command that's printed in your terminal. -
Clone the MAX repository:
git clone https://github.com/modular/max && \
cd max/pipelines/pythongit clone https://github.com/modular/max && \
cd max/pipelines/python -
Start a local endpoint for Llama 3:
magic run serve --huggingface-repo-id modularai/llama-3.1
magic run serve --huggingface-repo-id modularai/llama-3.1
This also installs MAX, Mojo, and other dependencies in a virtual environment, downloads model weights, and compiles the model. This might take some time.
The endpoint is ready when you see the URI printed in your terminal:
Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
-
Now open another terminal to send a request using
curl
:curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "modularai/llama-3.1",
"stream": true,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the World Series in 2020?"}
]
}' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "modularai/llama-3.1",
"stream": true,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the World Series in 2020?"}
]
}' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'
That's it. You just deployed Llama 3 on your local CPU. You can also deploy MAX to a cloud GPU.
Notice there was no step above to install MAX. That's because magic
automatically installs all package dependencies when it starts the endpoint.
Alternatively, you can deploy everything you need using our pre-configured
MAX container.
Stay in touch
Get the latest updates
Stay up to date on MAX’s updates and key feature releases. We’re moving fast over here.
Talk to an AI Expert
Connect with our product experts to explore how we can help you deploy and serve AI models with high performance, scalability, and cost-efficiency.
Try a tutorial
For a more detailed walkthrough of how to build and deploy with MAX, check out these tutorials.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!