Skip to main content
Log in

Python module


Implementations of provided tokenizers.


class max.pipelines.tokenizer.IdentityPipelineTokenizer(*args, **kwargs)


async decode(context: TokenGeneratorContext, encoded: str, **kwargs) → str

Decodes response tokens to text.

  • Parameters:

    • context (TokenGeneratorContext) – Current generation context.
    • encoded (TokenizerEncoded) – Encoded response tokens.
  • Returns:

    Un-encoded response text.

  • Return type:



async encode(prompt: str, add_special_tokens: bool = False) → str

Encodes text prompts as tokens.

  • Parameters:

    prompt (str) – Un-encoded prompt text.

  • Raises:

    ValueError – If the prompt exceeds the configured maximum length.


property eos*: int*

The end of sequence token for this tokenizer.


property expects_content_wrapping*: bool*

If true, this tokenizer expects messages to have a content property.

Text messages are formatted as:

{ "type": "text", "content": "text content" }
{ "type": "text", "content": "text content" }

instead of the OpenAI spec:

{ "type": "text", "text": "text content" }
{ "type": "text", "text": "text content" }

NOTE: Multimodal messages omit the content property. Both image_urls and image content parts are converted to:

{ "type": "image" }
{ "type": "image" }

Their content is provided as byte arrays through the top-level property on the request object, i.e., TokenGeneratorRequest.images.


class max.pipelines.tokenizer.PreTrainedPipelineTokenizer(delegate: PreTrainedTokenizer | PreTrainedTokenizerFast)


apply_chat_template(messages: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage]) → str


async decode(context: TokenGeneratorContext, encoded: ndarray, **kwargs) → str

Decodes response tokens to text.

  • Parameters:

    • context (TokenGeneratorContext) – Current generation context.
    • encoded (TokenizerEncoded) – Encoded response tokens.
  • Returns:

    Un-encoded response text.

  • Return type:



async encode(prompt: str, add_special_tokens: bool = False) → ndarray

Encodes text prompts as tokens.

  • Parameters:

    prompt (str) – Un-encoded prompt text.

  • Raises:

    ValueError – If the prompt exceeds the configured maximum length.


property eos*: int*

The end of sequence token for this tokenizer.


property expects_content_wrapping*: bool*

If true, this tokenizer expects messages to have a content property.

Text messages are formatted as:

{ "type": "text", "content": "text content" }
{ "type": "text", "content": "text content" }

instead of the OpenAI spec:

{ "type": "text", "text": "text content" }
{ "type": "text", "text": "text content" }

NOTE: Multimodal messages omit the content property. Both image_urls and image content parts are converted to:

{ "type": "image" }
{ "type": "image" }

Their content is provided as byte arrays through the top-level property on the request object, i.e., TokenGeneratorRequest.images.


class max.pipelines.tokenizer.TextAndVisionTokenizer(model_path: str, *, revision: str | None = None, max_length: int | None = None, max_new_tokens: int | None = None, trust_remote_code: bool = False)

Encapsulates creation of TextContext and specific token encode/decode logic.


apply_chat_template(messages: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage]) → str


async decode(context: TextAndVisionContext, encoded: ndarray, **kwargs) → str

Transformer a provided encoded token array, back into readable text.


async encode(prompt: str | Sequence[int], add_special_tokens: bool = True) → ndarray

Transform the provided prompt into a token array.


property eos*: int*

The end of sequence token for this tokenizer.


property expects_content_wrapping*: bool*

If true, this tokenizer expects messages to have a content property.

Text messages are formatted as:

{ "type": "text", "content": "text content" }
{ "type": "text", "content": "text content" }

instead of the OpenAI spec:

{ "type": "text", "text": "text content" }
{ "type": "text", "text": "text content" }

NOTE: Multimodal messages omit the content property. Both image_urls and image content parts are converted to:

{ "type": "image" }
{ "type": "image" }

Their content is provided as byte arrays through the top-level property on the request object, i.e., TokenGeneratorRequest.images.


async new_context(request: TokenGeneratorRequest) → TextAndVisionContext

Create a new TextAndVisionContext object, leveraging necessary information like cache_seq_id and prompt from TokenGeneratorRequest.


class max.pipelines.tokenizer.TextTokenizer(model_path: str, *, revision: str | None = None, max_length: int | None = None, max_new_tokens: int | None = None, trust_remote_code: bool = False, enable_llama_whitespace_fix: bool = False)

Encapsulates creation of TextContext and specific token encode/decode logic.


apply_chat_template(messages: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage], tools: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None) → str


async decode(context: TextContext, encoded: ndarray, **kwargs) → str

Transformer a provided encoded token array, back into readable text.


async encode(prompt: str | Sequence[int], add_special_tokens: bool = True) → ndarray

Transform the provided prompt into a token array.


property eos*: int*

The end of sequence token for this tokenizer.


property expects_content_wrapping*: bool*

If true, this tokenizer expects messages to have a content property.

Text messages are formatted as:

{ "type": "text", "content": "text content" }
{ "type": "text", "content": "text content" }

instead of the OpenAI spec:

{ "type": "text", "text": "text content" }
{ "type": "text", "text": "text content" }

NOTE: Multimodal messages omit the content property. Both image_urls and image content parts are converted to:

{ "type": "image" }
{ "type": "image" }

Their content is provided as byte arrays through the top-level property on the request object, i.e., TokenGeneratorRequest.images.


async new_context(request: TokenGeneratorRequest) → TextContext

Create a new TextContext object, leveraging necessary information like cache_seq_id and prompt from TokenGeneratorRequest.


max.pipelines.tokenizer.max_tokens_to_generate(prompt_size: int, max_length: int | None, max_new_tokens: int | None = None) → int | None

Returns the max number of new tokens to generate.


async max.pipelines.tokenizer.run_with_default_executor(fn, *args)