Skip to main content

Run a text embedding model

Text embeddings are rich numerical representations of text that power many modern natural language processing (NLP) applications. This tutorial shows you how to serve and interact with an embedding model using an OpenAI-compatible endpoint. Specifically, we'll use MAX to serve the all-mpnet-base-v2 model, which is a powerful transformer that excels at capturing semantic relationships in text.

System requirements:

Set up your environment

Create a Python project to install our APIs and CLI tools:

  1. If you don't have it, install pixi:
    curl -fsSL https://pixi.sh/install.sh | sh
    curl -fsSL https://pixi.sh/install.sh | sh

    Then restart your terminal for the changes to take effect.

  2. Create a project:
    pixi init embeddings-tutorial \
    -c https://conda.modular.com/max-nightly/ -c conda-forge \
    && cd embeddings-tutorial
    pixi init embeddings-tutorial \
    -c https://conda.modular.com/max-nightly/ -c conda-forge \
    && cd embeddings-tutorial
  3. Install the modular conda package:
    pixi add modular
    pixi add modular
  4. Start the virtual environment:
    pixi shell
    pixi shell

Serve your model

Use the max serve command to start a local model server with the all-mpnet-base-v2 model:

max serve \
--model-path sentence-transformers/all-mpnet-base-v2
max serve \
--model-path sentence-transformers/all-mpnet-base-v2

This will create a server running the all-mpnet-base-v2 embedding model on http://localhost:8000/v1/embeddings, an OpenAI compatible endpoint.

The endpoint is ready when you see this message printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

For a complete list of max CLI commands and options, refer to the MAX CLI reference.

Interact with your model

MAX supports OpenAI's REST APIs and you can interact with the model using either the OpenAI Python SDK or curl:

You can use OpenAI's Python client to interact with the model.

To interact with MAX's OpenAI-compatible endpoints, install the OpenAI Python SDK:

pixi add openai
pixi add openai

Then, create a client and make a request to the model:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

# Create embeddings
response = client.embeddings.create(
model="sentence-transformers/all-mpnet-base-v2",
input="Run an embedding model with MAX Serve!",
)

print(f"{response.data[0].embedding[:5]}")
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

# Create embeddings
response = client.embeddings.create(
model="sentence-transformers/all-mpnet-base-v2",
input="Run an embedding model with MAX Serve!",
)

print(f"{response.data[0].embedding[:5]}")

You should receive a response similar to this:

{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,

The text has been shortened for brevity. This returns a numerical representation of the input text that can be used for semantic comparisons.

For complete details on all available API endpoints and options, see the MAX Serve API documentation.

Next steps

Now that you have successfully set up MAX with an OpenAI-compatible embeddings endpoint, checkout out these other tutorials:

Did this tutorial work for you?