Python module

driver

Exposes APIs for interacting with hardware, such as allocating tensors on a GPU and moving tensors between the CPU and GPU. It provides interfaces for memory management, device properties, and hardware monitoring. Through these APIs, you can control data placement, track resource utilization, and configure device settings for optimal performance.

For example, you can use the following code to use an accelerator if one is available, otherwise use the CPU:

from max import driver

device = driver.CPU() if driver.accelerator_count() == 0 else driver.Accelerator()
print(f"Using {device} device")

`Accelerator`

class max.driver.Accelerator(self, id: int = -1, device_memory_limit: int = -1)

Creates an accelerator device with the specified ID and memory limit.

Provides access to GPU or other hardware accelerators in the system.

Repeated instantiations with a previously-used device-id will still refer to the first such instance that was created. This is especially important when providing a different memory limit: only the value (implicitly or explicitly) provided in the first such instantiation is effective.

from max import driver
# Create default accelerator (usually first available GPU)
device = driver.Accelerator()
# Or specify GPU id
device = driver.Accelerator(id=0)  # First GPU
device = driver.Accelerator(id=1)  # Second GPU
# Get device id
device_id = device.id
# Optionally specify memory limit
device = driver.Accelerator(id=0, device_memory_limit=256*MB)
device2 = driver.Accelerator(id=0, device_memory_limit=512*MB)
# ... device2 will use the memory limit of 256*MB

Args:
id (int, optional): The device ID to use. Defaults to -1, which selects
the first available accelerator.
device_memory_limit (int, optional): The maximum amount of memory
in bytes that can be allocated on the device. Defaults to 99% of free memory.

Returns:
Accelerator: A new Accelerator device object.

`CPU`

class max.driver.CPU(self, id: int = -1)

Creates a CPU device.

from max import driver
device = driver.CPU()
# Device id is always 0 for CPU devices
device_id = device.id

Parameters:: id (int, optional) – The device ID to use. Defaults to -1.
Returns:: A new CPU device object.
Return type:: CPU

`DLPackArray`

class max.driver.DLPackArray(*args, **kwargs)

`Device`

class max.driver.Device

`api`

property api

Returns the API used to program the device.

Possible values are:

cpu for host devices.
cuda for NVIDIA GPUs.
hip for AMD GPUs.

from max import driver

device = driver.CPU()
device.api

`architecture_name`

property architecture_name

Returns the architecture name of the device.

Examples of possible values:

gfx90a, gfx942 for AMD GPUs.
sm_80, sm_86 for NVIDIA GPUs.
CPU devices raise an exception.

from max import driver

device = driver.Accelerator()
device.archname

`can_access()`

can_access(self, other: max.driver.Device) → bool

Checks if this device can directly access memory of another device.

from max import driver

gpu0 = driver.Accelerator(id=0)
gpu1 = driver.Accelerator(id=1)

if gpu0.can_access(gpu1):
    print("GPU0 can directly access GPU1 memory.")

Parameters:: other (Device) – The other device to check peer access against.
Returns:: True if peer access is possible, False otherwise.
Return type:: bool

`cpu`

cpu = <nanobind.nb_func object>

`default_stream`

property default_stream

Returns the default stream for this device.

The default stream is initialized when the device object is created.

Returns:: The default execution stream for this device.
Return type:: DeviceStream

`id`

property id

Returns a zero-based device id. For a CPU device this is always 0. For GPU accelerators this is the id of the device relative to this host. Along with the label, an id can uniquely identify a device, e.g. gpu:0, gpu:1.

from max import driver

device = driver.Accelerator()
device_id = device.id

Returns:: The device ID.
Return type:: int

`is_compatible`

property is_compatible

Returns whether this device is compatible with MAX.

Returns:: True if the device is compatible with MAX, False otherwise.
Return type:: bool

`is_host`

property is_host

Whether this device is the CPU (host) device.

from max import driver

device = driver.CPU()
device.is_host

`label`

property label

Returns device label.

Possible values are:

cpu for host devices.
gpu for accelerators.

from max import driver

device = driver.CPU()
device.label

`stats`

property stats

Returns utilization data for the device.

from max import driver

device = driver.CPU()
stats = device.stats

Returns:: A dictionary containing device utilization statistics.
Return type:: dict

`synchronize()`

synchronize(self) → None

Ensures all operations on this device complete before returning.

Raises:: ValueError – If any enqueued operations had an internal error.

`DeviceSpec`

class max.driver.DeviceSpec(id, device_type='cpu')

Specification for a device, containing its ID and type.

This class provides a way to specify device parameters like ID and type (CPU/GPU) for creating Device instances.

Parameters:

id (int)
device_type (Literal['cpu', 'gpu'])

`accelerator()`

static accelerator(id=0)

Creates an accelerator (GPU) device specification.

Parameters:: id (int)

`cpu()`

static cpu(id=-1)

Creates a CPU device specification.

Parameters:: id (int)

`device_type`

device_type: Literal['cpu', 'gpu'] = 'cpu'

Type of specified device.

`id`

id: int

Provided id for this device.

`DeviceStream`

class max.driver.DeviceStream(self, device: max.driver.Device)

Provides access to a stream of execution on a device.

A stream represents a sequence of operations that will be executed in order. Multiple streams on the same device can execute concurrently.

from max import driver
# Create a default accelerator device
device = driver.Accelerator()
# Get the default stream for the device
stream = device.default_stream
# Create a new stream of execution on the device
new_stream = driver.DeviceStream(device)

Creates a new stream of execution associated with the device.

Parameters:: device (Device) – The device to create the stream on.
Returns:: A new stream of execution.
Return type:: DeviceStream

`device`

property device

The device this stream is executing on.

`synchronize()`

synchronize(self) → None

Ensures all operations on this stream complete before returning.

Raises:: ValueError – If any enqueued operations had an internal error.

`wait_for()`

wait_for(self, stream: max.driver.DeviceStream) → None

wait_for(self, device: max.driver.Device) → None

Overloaded function.

wait_for(self, stream: max.driver.DeviceStream) -> None

Ensures all operations on the other stream complete before future work submitted to this stream is scheduled.

Args:
stream (DeviceStream): The stream to wait for.
wait_for(self, device: max.driver.Device) -> None

Ensures all operations on device’s default stream complete before future work submitted to this stream is scheduled.

Args:
device (Device): The device whose default stream to wait for.

`Tensor`

class max.driver.Tensor(self, dtype: max.dtype.DType, shape: collections.abc.Sequence[int], device: max.driver.Device | None = None, pinned: bool = False)

class max.driver.Tensor(self, dtype: max.dtype.DType, shape: collections.abc.Sequence[int], stream: max.driver.DeviceStream, pinned: bool = False)

class max.driver.Tensor(self, shape: ndarray[writable=False], device: max.driver.Device)

Device-resident tensor representation.

Allocates memory onto a given device with the provided shape and dtype. Tensors can be sliced to provide strided views of the underlying memory, but any tensors input into model execution must be contiguous.

Supports numpy-style slicing but does not currently support setting items across multiple indices.

from max import driver
from max.dtype import DType

# Create a tensor on CPU
cpu_tensor = driver.Tensor(shape=[2, 3], dtype=DType.float32)

# Create a tensor on GPU
gpu = driver.Accelerator()
gpu_tensor = driver.Tensor(shape=[2, 3], dtype=DType.float32, device=gpu)

Parameters:

dtype (DType) – Data type of tensor elements.
shape (Sequence[int]) – Tuple of positive, non-zero integers denoting the tensor shape.
device (Device, optional) – Device to allocate tensor onto. Defaults to the CPU.
pinned (bool, optional) – If True, memory is page-locked (pinned). Defaults to False.
stream (DeviceStream, optional) – Stream to associate the tensor with.

`contiguous()`

contiguous()

Creates a contiguous copy of the parent tensor.

Parameters:: self (Tensor)
Return type:: Tensor

`copy()`

copy(self, stream: max.driver.DeviceStream) → max.driver.Tensor

copy(self, device: max.driver.Device | None = None) → max.driver.Tensor

Overloaded function.

copy(self, stream: max.driver.DeviceStream) -> max.driver.Tensor

Creates a deep copy on the device associated with the stream.

Args:
stream (DeviceStream): The stream to associate the new tensor with.

Returns:
Tensor: A new tensor that is a copy of this tensor.
copy(self, device: max.driver.Device | None = None) -> max.driver.Tensor

Creates a deep copy on an optionally given device.
If device is None (default), a copy is created on the same device.
```
from max import driver
from max.dtype import DType

cpu_tensor = driver.Tensor(shape=[2, 3], dtype=DType.bfloat16, device=driver.CPU())
cpu_copy = cpu_tensor.copy()

# Copy to GPU
gpu = driver.Accelerator()
gpu_copy = cpu_tensor.copy(device=gpu)
```
Args:
device (Device, optional): The device to create the copy on.
Defaults to None (same device).

Returns:
Tensor: A new tensor that is a copy of this tensor.

`device`

property device

Device on which tensor is resident.

`dtype`

property dtype

DType of constituent elements in tensor.

`element_size`

property element_size

Return the size of the element type in bytes.

`from_dlpack()`

from_dlpack(*, copy=None)

Create a tensor from an object implementing the dlpack protocol.

This usually does not result in a copy, and the producer of the object retains ownership of the underlying memory.

Parameters:

array (Any)
copy (bool | None)

Return type:

Tensor

`from_numpy()`

from_numpy()

Creates a tensor from a provided numpy array on the host device.

The underlying data is not copied unless the array is noncontiguous. If it is, a contiguous copy will be returned.

Parameters:: arr (ndarray[tuple[int, ...], dtype[Any]])
Return type:: Tensor

`inplace_copy_from()`

inplace_copy_from(src)

Copy the contents of another tensor into this one.

These tensors may be on different devices. Requires that both tensors are contiguous and have same size.

Parameters:

self (Tensor)
src (Tensor)

Return type:

None

`is_contiguous`

property is_contiguous

Whether or not tensor is contiguously allocated in memory. Returns false if the tensor is a non-contiguous slice.

Currently, we consider certain situations that are contiguous as non-contiguous for the purposes of our engine, such as when a tensor has negative steps.

`is_host`

property is_host

Whether or not tensor is host-resident. Returns false for GPU tensors, true for CPU tensors.

from max import driver
from max.dtype import DType

cpu_tensor = driver.Tensor(shape=[2, 3], dtype=DType.bfloat16, device=driver.CPU())

print(cpu_tensor.is_host)

`item()`

item(self) → Any

Returns the scalar value at a given location. Currently implemented only for zero-rank tensors. The return type is converted to a Python built-in type.

`mmap()`

mmap(dtype, shape, mode='copyonwrite', offset=0)

Parameters:

filename (PathLike[str] | str)
dtype (DType)
shape (ShapeType | int)
mode (np._MemMapModeKind)
offset (int)

Return type:

Tensor

`num_elements`

property num_elements

Returns the number of elements in this tensor.

Rank-0 tensors have 1 element by convention.

`pinned`

property pinned

Whether or not the underlying memory is pinned (page-locked).

`rank`

property rank

Tensor rank.

`scalar`

scalar = <nanobind.nb_func object>

`shape`

property shape

Shape of tensor.

`stream`

property stream

Stream to which tensor is bound.

`to()`

to(self, device: max.driver.Device) → Tensor

to(self, device: max.driver.DeviceStream) → Tensor

Overloaded function.

to(self, device: max.driver.Device) -> Tensor

Return a tensor that’s guaranteed to be on the given device.

The tensor is only copied if the requested device is different from the device upon which the tensor is already resident.
to(self, device: max.driver.DeviceStream) -> Tensor

Return a tensor that’s guaranteed to be on the given device and associated with the given stream.

The tensor is only copied if the requested device is different from the device upon which the tensor is already resident.

`to_numpy()`

to_numpy()

Converts the tensor to a numpy array.

If the tensor is not on the host, an exception is raised.

Parameters:: self (Tensor)
Return type:: ndarray[tuple[int, …], dtype[Any]]

`view()`

view(dtype, shape=None)

Return a new tensor with the given type and shape that shares the underlying memory.

If the shape is not given, it will be deduced if possible, or a ValueError is raised.

Parameters:

self (Tensor)
dtype (DType)
shape (Sequence[int] | None)

Return type:

Tensor

`zeros`

zeros = <nanobind.nb_func object>

`accelerator_api()`

max.driver.accelerator_api()

Returns the API used to program the accelerator.

Return type:: str

`accelerator_architecture_name()`

max.driver.accelerator_architecture_name()

Returns the architecture name of the accelerator device.

Return type:: str

`devices_exist()`

max.driver.devices_exist(devices)

Identify if devices exist.

Parameters:: devices (list[DeviceSpec])
Return type:: bool

`load_devices()`

max.driver.load_devices(device_specs)

Initialize and return a list of devices, given a list of device specs.

Parameters:: device_specs (list[DeviceSpec])
Return type:: list[Device]

`scan_available_devices()`

max.driver.scan_available_devices()

Returns all accelerators if available, else return cpu.

Return type:: list[DeviceSpec]

`accelerator_count()`

max.driver.accelerator_count() → int

Returns number of accelerator devices available.

Accelerator​

CPU​

DLPackArray​

Device​

api​

architecture_name​

can_access()​

cpu​

default_stream​

id​

is_compatible​

is_host​

label​

stats​

synchronize()​

DeviceSpec​

accelerator()​

cpu()​

device_type​

id​

DeviceStream​

device​

synchronize()​

wait_for()​

Tensor​

contiguous()​

copy()​

device​

dtype​

element_size​

from_dlpack()​

from_numpy()​

inplace_copy_from()​

is_contiguous​

is_host​

item()​

mmap()​

num_elements​

pinned​

rank​

scalar​

shape​

stream​

to()​

to_numpy()​

view()​

zeros​

accelerator_api()​

accelerator_architecture_name()​

devices_exist()​

load_devices()​

scan_available_devices()​

accelerator_count()​

`Accelerator`

`CPU`

`DLPackArray`

`Device`

`api`

`architecture_name`

`can_access()`

`cpu`

`default_stream`

`id`

`is_compatible`

`is_host`

`label`

`stats`

`synchronize()`

`DeviceSpec`

`accelerator()`

`cpu()`

`device_type`

`id`

`DeviceStream`

`device`

`synchronize()`

`wait_for()`

`Tensor`

`contiguous()`

`copy()`

`device`

`dtype`

`element_size`

`from_dlpack()`

`from_numpy()`

`inplace_copy_from()`

`is_contiguous`

`is_host`

`item()`

`mmap()`

`num_elements`

`pinned`

`rank`

`scalar`

`shape`

`stream`

`to()`

`to_numpy()`

`view()`

`zeros`

`accelerator_api()`

`accelerator_architecture_name()`

`devices_exist()`

`load_devices()`

`scan_available_devices()`

`accelerator_count()`