Skip to main content
Log in

Python module

context

Standardized context object for Pipeline Inference.

InputContext

class max.pipelines.context.InputContext(*args, **kwargs)

A base class for model contexts, represent model inputs for TokenGenerators.

active_idx

property active_idx*: int*

active_length

property active_length*: int*

num tokens input this iteration.

This will be the prompt size for context encoding, and simply 1 for token generation.

  • Type:

    Current sequence length

bump_token_indices()

bump_token_indices(start_idx: int | None = None, active_idx: int | None = None, end_idx: int | None = None) → None

Update the start_idx, active_idx and end_idx without manipulating the token array.

cache_seq_id

property cache_seq_id*: int*

current_length

property current_length*: int*

The current length of the sequence, including completed and active tokens.

end_idx

property end_idx*: int*

json_schema

property json_schema*: str | None*

A json schema to use during constrained decoding.

jump_ahead()

jump_ahead(new_token: int) → None

Updates the token array, while ensuring the new token is returned to the user.

log_probabilities

property log_probabilities*: int*

When > 0, returns the log probabilities for the top N tokens for each element token in the sequence.

log_probabilities_echo

property log_probabilities_echo*: bool*

When True, the input tokens are added to the returned logprobs.

matcher

property matcher*: 'xgr.GrammarMatcher' | None*

An optional xgr Grammar Matcher provided when using structured output.

max_length

property max_length*: int | None*

The maximum length of this sequence.

next_tokens

property next_tokens*: ndarray*

The next prompt tokens to be input during this iteration.

This should be a 1D array of tokens of length active_length.

outstanding_completion_tokens()

outstanding_completion_tokens() → list[tuple[int, Optional[max.pipelines.interfaces.response.LogProbabilities]]]

Return the list of outstanding completion tokens and log probabilities that must be returned to the user.

reset()

reset() → None

Resets the context’s state by combining all tokens into a new prompt. This method is used when a request is evicted, meaning that the context needed to be re-encoded in the following CE iteration.

set_matcher()

set_matcher(matcher: xgr.GrammarMatcher) → None

Set a grammar matcher for use during constrained decoding.

start_idx

property start_idx*: int*

update()

update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None

Updates the next_tokens and extends existing tokens to include all generated tokens.

TextAndVisionContext

class max.pipelines.context.TextAndVisionContext(cache_seq_id: int, prompt: str | Sequence[int], max_length: int | None, tokens: ndarray, pixel_values: Sequence[ndarray], extra_model_args: dict[str, Any], log_probabilities: int = 0, log_probabilities_echo: bool = False, json_schema: str | None = None)

A base class for model context, specifically for Vision model variants.

update()

update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None

Updates the next_tokens and extends existing tokens to include all generated tokens.

TextContext

class max.pipelines.context.TextContext(cache_seq_id: int, prompt: str | Sequence[int], max_length: int | None, tokens: ndarray, log_probabilities: int = 0, log_probabilities_echo: bool = False, json_schema: str | None = None)

A base class for model context, specifically for Text model variants.

active_idx

property active_idx*: int*

active_length

property active_length*: int*

num tokens input this iteration.

This will be the prompt size for context encoding, and simply 1 (or more) for token generation.

  • Type:

    Current sequence length

bump_token_indices()

bump_token_indices(start_idx: int | None = None, active_idx: int | None = None, end_idx: int | None = None) → None

Update the start_idx, active_idx and end_idx without manipulating the token array.

current_length

property current_length*: int*

The current length of the sequence, including completed and active tokens.

end_idx

property end_idx*: int*

jump_ahead()

jump_ahead(new_token: int) → None

Updates the token array, while ensuring the new token is returned to the user.

next_tokens

property next_tokens*: ndarray*

outstanding_completion_tokens()

outstanding_completion_tokens() → list[tuple[int, Optional[max.pipelines.interfaces.response.LogProbabilities]]]

Return the list of outstanding completion tokens and log probabilities that must be returned to the user.

reset()

reset() → None

Resets the context’s state by combining all tokens into a new prompt.

set_matcher()

set_matcher(matcher: xgr.GrammarMatcher) → None

start_idx

property start_idx*: int*

update()

update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None

Updates the next_tokens and extends existing tokens to include all generated tokens.