Python module

engine

The APIs in this module allow you to run inference with MAX Engine—a graph compiler and runtime that accelerates your AI models on a wide variety of hardware.

`InferenceSession`

class max.engine.InferenceSession(num_threads=None, devices=None, *, custom_extensions=None)

Manages an inference session in which you can load and run models.

You need an instance of this to load a model as a Model object. For example:

session = engine.InferenceSession()
model_path = Path('bert-base-uncased')
model = session.load(model_path)

Construct an inference session.

Parameters:

num_threads (int | None) – Number of threads to use for the inference session. This defaults to the number of physical cores on your machine.
devices (Iterable[Device] | None) – A list of devices on which to run inference. Default is the host CPU only.
custom_extensions (CustomExtensionsType | None) – The extensions to load for the model. Supports paths to a .mojopkg custom ops library or a .mojo source file.

`devices`

property devices: list[Device]

A list of available devices.

`gpu_profiling()`

gpu_profiling(mode)

Enables end to end gpu profiling configuration.

Parameters:: mode (GPUProfilingMode)
Return type:: None

`load()`

load(model, *, custom_extensions=None, custom_ops_path=None, weights_registry=None)

Loads a trained model and compiles it for inference.

Parameters:

model (Union[str, Path, Any]) – Path to a model.
custom_extensions (CustomExtensionsType | None) – The extensions to load for the model. Supports paths to .mojopkg custom ops.
custom_ops_path (str | None) – The path to your custom ops Mojo package. Deprecated, use custom_extensions instead.
weights_registry (Mapping[str, DLPackArray] | None) – A mapping from names of model weights’ names to their values. The values are currently expected to be dlpack arrays. If an array is a read-only numpy array, the user must ensure that its lifetime extends beyond the lifetime of the model.

Returns:

The loaded model, compiled and ready to execute.

Raises:

RuntimeError – If the path provided is invalid.

Return type:

Model

`set_mojo_assert_level()`

set_mojo_assert_level(level)

Sets which mojo asserts are kept in the compiled model.

Parameters:: level (str | AssertLevel)
Return type:: None

`set_mojo_log_level()`

set_mojo_log_level(level)

Sets the verbosity of mojo logging in the compiled model.

Parameters:: level (str | LogLevel)
Return type:: None

`set_split_k_reduction_precision()`

set_split_k_reduction_precision(precision)

Sets the accumulation precision for split k reductions in large matmuls.

Parameters:: precision (str | SplitKReductionPrecision)
Return type:: None

`Model`

class max.engine.Model

A loaded model that you can execute.

Do not instantiate this class directly. Instead, create it with InferenceSession.

`call()`

__call__(*args, **kwargs)

Call self as a function.

Parameters:

self (Model)
args (DLPackArray | Tensor | MojoValue | int | float | bool | generic)
kwargs (DLPackArray | Tensor | MojoValue | int | float | bool | generic)

Return type:

list[Tensor | MojoValue]

`execute()`

execute(*args)

Parameters:

self (Model)
args (DLPackArray | Tensor | MojoValue | int | float | bool | generic)

Return type:

list[Tensor | MojoValue]

`input_metadata`

property input_metadata

Metadata about the model’s input tensors, as a list of TensorSpec objects.

For example, you can print the input tensor names, shapes, and dtypes:

for tensor in model.input_metadata:
    print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')

`output_metadata`

property output_metadata

Metadata about the model’s output tensors, as a list of TensorSpec objects.

For example, you can print the output tensor names, shapes, and dtypes:

for tensor in model.output_metadata:
    print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')

`GPUProfilingMode`

class max.engine.GPUProfilingMode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

The supported modes for GPU profiling.

`DETAILED`

DETAILED = 'detailed'

`OFF`

OFF = 'off'

`ON`

ON = 'on'

`LogLevel`

class max.engine.LogLevel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Internal use.

`CRITICAL`

CRITICAL = 'critical'

`DEBUG`

DEBUG = 'debug'

`ERROR`

ERROR = 'error'

`INFO`

INFO = 'info'

`NOTSET`

NOTSET = 'notset'

`WARNING`

WARNING = 'warning'

`MojoValue`

class max.engine.MojoValue

This is work in progress and you should ignore it for now.

`TensorSpec`

class max.engine.TensorSpec

Defines the properties of a tensor, including its name, shape and data type.

For usage examples, see Model.input_metadata.

`dtype`

property dtype

A tensor data type.

`name`

property name

A tensor name.

`shape`

property shape

The shape of the tensor as a list of integers.

If a dimension size is unknown/dynamic (such as the batch size), its value is None.

`CustomExtensionsType`

max.engine.CustomExtensionsType

alias of list[str | Path | Any] | str | Path | Any

InferenceSession​

devices​

gpu_profiling()​

load()​

set_mojo_assert_level()​

set_mojo_log_level()​

set_split_k_reduction_precision()​

Model​

__call__()​

execute()​

input_metadata​

output_metadata​

GPUProfilingMode​

DETAILED​

OFF​

ON​

LogLevel​

CRITICAL​

DEBUG​

ERROR​

INFO​

NOTSET​

WARNING​

MojoValue​

TensorSpec​

dtype​

name​

shape​

CustomExtensionsType​

`InferenceSession`

`devices`

`gpu_profiling()`

`load()`

`set_mojo_assert_level()`

`set_mojo_log_level()`

`set_split_k_reduction_precision()`

`Model`

`call()`

`execute()`

`input_metadata`

`output_metadata`

`GPUProfilingMode`

`DETAILED`

`OFF`

`ON`

`LogLevel`

`CRITICAL`

`DEBUG`

`ERROR`

`INFO`

`NOTSET`

`WARNING`

`MojoValue`

`TensorSpec`

`dtype`

`name`

`shape`

`CustomExtensionsType`