Python module
interfaces
Universal interfaces between all aspects of the MAX Inference Stack.
AudioGenerationResponse
class max.interfaces.AudioGenerationResponse(final_status, audio=None, buffer_speech_tokens=None)
Represents a response from the audio generation API.
-
Parameters:
-
- final_status (GenerationStatus) – The final status of the generation process.
- audio (ndarray | None) – The generated audio data, if available.
- buffer_speech_tokens (ndarray | None) – Buffered speech tokens, if available.
audio
audio_data
property audio_data: ndarray
Returns the audio data if available.
-
Returns:
-
The generated audio data.
-
Return type:
-
np.ndarray
-
Raises:
-
AssertionError – If audio data is not available.
buffer_speech_tokens
final_status
final_status: GenerationStatus
has_audio_data
property has_audio_data: bool
Checks if audio data is present in the response.
-
Returns:
-
True if audio data is available, False otherwise.
-
Return type:
is_done
property is_done: bool
Indicates whether the audio generation process is complete.
-
Returns:
-
True if generation is done, False otherwise.
-
Return type:
EmbeddingsResponse
class max.interfaces.EmbeddingsResponse(embeddings)
Response structure for embedding generation.
-
Parameters:
-
embeddings (ndarray) – The generated embeddings as a NumPy array.
embeddings
embeddings: ndarray
EngineResult
class max.interfaces.EngineResult(status, result)
Structure representing the result of an engine operation.
-
Parameters:
-
- status (EngineStatus) – The status of the operation.
- result (T | None) – The result data of the operation.
active()
classmethod active(result)
Create an EngineResult representing an active operation.
-
Parameters:
-
result (T) – The result data of the operation.
-
Returns:
-
An EngineResult with ACTIVE status and the provided result.
-
Return type:
cancelled()
classmethod cancelled()
Create an EngineResult representing a cancelled operation.
-
Returns:
-
An EngineResult with CANCELLED status and no result.
-
Return type:
complete()
classmethod complete(result)
Create an EngineResult representing a completed operation.
-
Returns:
-
An EngineResult with COMPLETE status and no result.
-
Return type:
-
Parameters:
-
result (T)
result
result: T | None
status
status: EngineStatus
stop_stream
property stop_stream: bool
Determine if the stream should continue based on the current status.
-
Returns:
-
True if the stream should stop, False otherwise.
-
Return type:
EngineStatus
class max.interfaces.EngineStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Represents the status of an engine operation.
- Status values:
- ACTIVE: Indicates that the engine executed the operation successfully and request remains active. CANCELLED: Indicates that the request was cancelled before completion; no further data will be provided. COMPLETE: Indicates that the engine executed the operation successfully and the request is completed.
ACTIVE
ACTIVE = 'active'
Indicates that the engine executed the operation successfully and request remains active.
CANCELLED
CANCELLED = 'cancelled'
Indicates that the request was cancelled before completion; no further data will be provided.
COMPLETE
COMPLETE = 'complete'
Indicates that the request was previously finished and no further data should be streamed.
GenerationStatus
class max.interfaces.GenerationStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Enum representing the status of a generation process in the MAX API.
ACTIVE
ACTIVE = 'active'
The generation process is ongoing.
END_OF_SEQUENCE
END_OF_SEQUENCE = 'end_of_sequence'
The generation process has reached the end of the sequence.
MAXIMUM_LENGTH
MAXIMUM_LENGTH = 'maximum_length'
The generation process has reached the maximum allowed length.
is_done
property is_done: bool
Returns True if the generation process is complete (not ACTIVE).
-
Returns:
-
True if the status is not ACTIVE, indicating completion.
-
Return type:
LogProbabilities
class max.interfaces.LogProbabilities(token_log_probabilities=<factory>, top_log_probabilities=<factory>)
Log probabilities for an individual output token.
This is a data-only class that serves as a serializable data structure for transferring log probability information. It does not provide any functionality for calculating or manipulating log probabilities - it is purely for data storage and serialization purposes.
-
Parameters:
token_log_probabilities
top_log_probabilities
PipelineTask
class max.interfaces.PipelineTask(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Enum representing the types of pipeline tasks supported.
AUDIO_GENERATION
AUDIO_GENERATION = 'audio_generation'
Task for generating audio.
EMBEDDINGS_GENERATION
EMBEDDINGS_GENERATION = 'embeddings_generation'
Task for generating embeddings.
SPEECH_TOKEN_GENERATION
SPEECH_TOKEN_GENERATION = 'speech_token_generation'
Task for generating speech tokens.
TEXT_GENERATION
TEXT_GENERATION = 'text_generation'
Task for generating text.
output_type
property output_type: type
Get the output type for the pipeline task.
-
Returns:
-
The output type for the pipeline task.
-
Return type:
SamplingParams
class max.interfaces.SamplingParams(top_k=1, top_p=1, min_p=0.0, temperature=1, frequency_penalty=0.0, presence_penalty=0.0, repetition_penalty=1.0, max_new_tokens=None, min_new_tokens=0, ignore_eos=False, stop=None, stop_token_ids=None, detokenize=True, seed=0)
Request specific sampling parameters that are only known at run time.
-
Parameters:
detokenize
detokenize: bool = True
Whether to detokenize the output tokens into text.
frequency_penalty
frequency_penalty: float = 0.0
The frequency penalty to apply to the model’s output. A positive value will penalize new tokens based on their frequency in the generated text: tokens will receive a penalty proportional to the count of appearances.
ignore_eos
ignore_eos: bool = False
If True, the response will ignore the EOS token, and continue to generate until the max tokens or a stop string is hit.
max_new_tokens
The maximum number of new tokens to generate in the response. If not set, the model may generate tokens until it reaches its internal limits or based on other stopping criteria.
min_new_tokens
min_new_tokens: int = 0
The minimum number of tokens to generate in the response.
min_p
min_p: float = 0.0
Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.
presence_penalty
presence_penalty: float = 0.0
The presence penalty to apply to the model’s output. A positive value will penalize new tokens that have already appeared in the generated text at least once by applying a constant penalty.
repetition_penalty
repetition_penalty: float = 1.0
The repetition penalty to apply to the model’s output. Values > 1 will penalize new tokens that have already appeared in the generated text at least once by dividing the logits by the repetition penalty.
seed
seed: int = 0
The seed to use for the random number generator.
stop
A list of detokenized sequences that can be used as stop criteria when generating a new sequence.
stop_token_ids
A list of token ids that are used as stopping criteria when generating a new sequence.
temperature
temperature: float = 1
Controls the randomness of the model’s output; higher values produce more diverse responses.
top_k
top_k: int = 1
Limits the sampling to the K most probable tokens. This defaults to 1, which enables greedy sampling.
top_p
top_p: float = 1
Only use the tokens whose cumulative probability is within the top_p threshold. This applies to the top_k tokens.
TextGenerationResponse
class max.interfaces.TextGenerationResponse(tokens, final_status)
Response structure for text generation.
-
Parameters:
-
- tokens (list[TextResponse]) – List of generated text responses.
- final_status (GenerationStatus) – The final status of the generation process.
append_token()
append_token(token)
-
Parameters:
-
token (TextResponse)
-
Return type:
-
None
final_status
final_status: GenerationStatus
is_done
property is_done: bool
tokens
tokens: list[TextResponse]
update_status()
update_status(status)
-
Parameters:
-
status (GenerationStatus)
-
Return type:
-
None
TextResponse
class max.interfaces.TextResponse(next_token, log_probabilities=None)
A base class for model responses, specifically for text model variants.
-
Parameters:
-
- next_token (int | str) – Encoded predicted next token.
- log_probabilities (LogProbabilities | None) – Log probabilities of each output token.
log_probabilities
log_probabilities: LogProbabilities | None
next_token
TokenGenerator
class max.interfaces.TokenGenerator(*args, **kwargs)
Interface for LLM token-generator models.
next_token()
next_token(batch, num_steps)
Computes the next token response for a single batch.
release()
release(context)
Releases resources associated with this context.
-
Parameters:
-
context (T) – Finished context.
-
Return type:
-
None
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!