
Run a text embedding model
Text embeddings are rich numerical representations of text that power many modern natural language processing (NLP) applications. This tutorial shows you how to serve and interact with an embedding model using an OpenAI-compatible endpoint. Specifically, we'll use MAX to serve the all-mpnet-base-v2 model, which is a powerful transformer that excels at capturing semantic relationships in text.
System requirements:
Mac
Linux
WSL
Set up your environment
Create a Python project to install our APIs and CLI tools:
- pixi
- uv
- pip
- conda
- If you don't have it, install
pixi
:curl -fsSL https://pixi.sh/install.sh | sh
curl -fsSL https://pixi.sh/install.sh | sh
Then restart your terminal for the changes to take effect.
- Create a project:
pixi init embeddings-tutorial \
-c https://conda.modular.com/max-nightly/ -c conda-forge \
&& cd embeddings-tutorialpixi init embeddings-tutorial \
-c https://conda.modular.com/max-nightly/ -c conda-forge \
&& cd embeddings-tutorial - Install the
modular
conda package:- Nightly
- Stable
pixi add modular
pixi add modular
pixi add "modular=25.4"
pixi add "modular=25.4"
- Start the virtual environment:
pixi shell
pixi shell
- If you don't have it, install
uv
:curl -LsSf https://astral.sh/uv/install.sh | sh
curl -LsSf https://astral.sh/uv/install.sh | sh
Then restart your terminal to make
uv
accessible. - Create a project:
uv init embeddings-tutorial && cd embeddings-tutorial
uv init embeddings-tutorial && cd embeddings-tutorial
- Create and start a virtual environment:
uv venv && source .venv/bin/activate
uv venv && source .venv/bin/activate
- Install the
modular
Python package:- Nightly
- Stable
uv pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--index-url https://dl.modular.com/public/nightly/python/simple/ \
--index-strategy unsafe-best-match --prerelease allowuv pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--index-url https://dl.modular.com/public/nightly/python/simple/ \
--index-strategy unsafe-best-match --prerelease allowuv pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://modular.gateway.scarf.sh/simple/ \
--index-strategy unsafe-best-matchuv pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://modular.gateway.scarf.sh/simple/ \
--index-strategy unsafe-best-match
- Create a project folder:
mkdir embeddings-tutorial && cd embeddings-tutorial
mkdir embeddings-tutorial && cd embeddings-tutorial
- Create and activate a virtual environment:
python3 -m venv .venv/embeddings-tutorial \
&& source .venv/embeddings-tutorial/bin/activatepython3 -m venv .venv/embeddings-tutorial \
&& source .venv/embeddings-tutorial/bin/activate - Install the
modular
Python package:- Nightly
- Stable
pip install --pre modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--index-url https://dl.modular.com/public/nightly/python/simple/pip install --pre modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--index-url https://dl.modular.com/public/nightly/python/simple/pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://modular.gateway.scarf.sh/simple/pip install modular \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://modular.gateway.scarf.sh/simple/
- If you don't have it, install conda. A common choice is with
brew
:brew install miniconda
brew install miniconda
- Initialize
conda
for shell interaction:conda init
conda init
If you're on a Mac, instead use:
conda init zsh
conda init zsh
Then restart your terminal for the changes to take effect.
- Create a project:
conda create -n embeddings-tutorial
conda create -n embeddings-tutorial
- Start the virtual environment:
conda activate embeddings-tutorial
conda activate embeddings-tutorial
- Install the
modular
conda package:- Nightly
- Stable
conda install -c conda-forge -c https://conda.modular.com/max-nightly/ modular
conda install -c conda-forge -c https://conda.modular.com/max-nightly/ modular
conda install -c conda-forge -c https://conda.modular.com/max/ modular
conda install -c conda-forge -c https://conda.modular.com/max/ modular
Serve your model
Use the max serve
command to start a local model server
with the
all-mpnet-base-v2
model:
max serve \
--model-path sentence-transformers/all-mpnet-base-v2
max serve \
--model-path sentence-transformers/all-mpnet-base-v2
This will create a server running the all-mpnet-base-v2
embedding model on
http://localhost:8000/v1/embeddings
, an OpenAI compatible
endpoint.
The endpoint is ready when you see this message printed in your terminal:
Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
For a complete list of max
CLI commands and options, refer to the
MAX CLI reference.
Interact with your model
MAX supports OpenAI's REST APIs and you can interact with the model using either the OpenAI Python SDK or curl:
- Python
- curl
You can use OpenAI's Python client to interact with the model.
To interact with MAX's OpenAI-compatible endpoints, install the OpenAI Python SDK:
- pixi
- uv
- pip
- conda
pixi add openai
pixi add openai
uv add openai
uv add openai
pip install openai
pip install openai
conda install openai
conda install openai
Then, create a client and make a request to the model:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
# Create embeddings
response = client.embeddings.create(
model="sentence-transformers/all-mpnet-base-v2",
input="Run an embedding model with MAX Serve!",
)
print(f"{response.data[0].embedding[:5]}")
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
# Create embeddings
response = client.embeddings.create(
model="sentence-transformers/all-mpnet-base-v2",
input="Run an embedding model with MAX Serve!",
)
print(f"{response.data[0].embedding[:5]}")
You should receive a response similar to this:
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
The text has been shortened for brevity. This returns a numerical representation of the input text that can be used for semantic comparisons.
The following curl
command sends an embeddings request to the model's chat completions
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "Run an embedding model with MAX Serve!",
"model": "sentence-transformers/all-mpnet-base-v2"
}'
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "Run an embedding model with MAX Serve!",
"model": "sentence-transformers/all-mpnet-base-v2"
}'
You should receive a response similar to this:
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
The text has been shortened for brevity. This returns a numerical representation of the input text that can be used for semantic comparisons.
For complete details on all available API endpoints and options, see the MAX Serve API documentation.
Next steps
Now that you have successfully set up MAX with an OpenAI-compatible embeddings endpoint, checkout out these other tutorials:
Did this tutorial work for you?
Thank you! We'll create more content like this.
Thank you for helping us improve!