Skip to main content
Log in

Get started with MAX

With just a few commands, you can install MAX as a conda package and deploy a GenAI model on a local endpoint.

System requirements:

Start a GenAI endpoint

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash
    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Clone the MAX repository:

    git clone https://github.com/modularml/max && \
    cd max/pipelines/python
    git clone https://github.com/modularml/max && \
    cd max/pipelines/python
  3. Start a local endpoint for Llama 3:

    magic run serve --huggingface-repo-id modularai/llama-3.1
    magic run serve --huggingface-repo-id modularai/llama-3.1

    This also installs MAX, Mojo, and other dependencies in a virtual environment, downloads model weights, and compiles the model. This might take some time.

    The endpoint is ready when you see the URI printed in your terminal:

    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "modularai/llama-3.1",
    "stream": true,
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'
    curl -N http://0.0.0.0:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "modularai/llama-3.1",
    "stream": true,
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'

That's it. You just deployed Llama 3 on your local CPU. You can also deploy MAX to a cloud GPU.

Notice there was no step above to install MAX. That's because magic automatically installs all package dependencies when it starts the endpoint. Alternatively, you can deploy everything you need using our pre-configured MAX container.

Stay in touch

Get the latest updates

Stay up to date on MAX’s updates and key feature releases. We’re moving fast over here.

Talk to an AI Expert

Connect with our product experts to explore how we can help you deploy and serve AI models with high performance, scalability, and cost-efficiency.

Book a call

Try a tutorial

For a more detailed walkthrough of how to build and deploy with MAX, check out these tutorials.