Python module

linear

Multi-layer Perceptron.

`DistributedMLP`

class max.pipelines.nn.linear.DistributedMLP(list_of_mlps: 'list[MLP]', num_devices: 'int')

`list_of_mlps`

list_of_mlps*: list[max.pipelines.nn.linear.MLP]*

`num_devices`

num_devices*: int*

`GPTQLinear`

class max.pipelines.nn.linear.GPTQLinear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, quantization_encoding: QuantizationEncoding | None = None, quantization_config: QuantizationConfig | None = None, perm_idx: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None)

A Linear layer for GPTQ encoding

`perm_idx`

perm_idx*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None* = None

`quantization_config`

quantization_config*: QuantizationConfig | None* = None

`GPTQLinearV2`

class max.pipelines.nn.linear.GPTQLinearV2(in_dim: int, out_dim: int, dtype: DType, device: DeviceRef, has_bias: bool = False, quantization_encoding: QuantizationEncoding | None = None, quantization_config: QuantizationConfig | None = None)

A Linear layer for GPTQ encoding

Initializes the linear layer with weights and optional bias with GPTQ quantization.

Parameters:
- in_dim – The dimensionality of the input space.
- out_dim – The dimensionality of the output space.
- dtype – The data type for both weights and bias.
- device – The target device for computation. Weights remain on CPU until moved during computation.
- name – Base name for weights (appended with .weight and .bias if applicable).
- has_bias – When True, adds a bias vector to the layer. Defaults to False.

`Linear`

class max.pipelines.nn.linear.Linear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None)

A unified linear layer that delegates to either regular or quantized implementation.

`bias`

bias*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None* = None

`create()`

classmethod create(dtype: DType, quantization_encoding: QuantizationEncoding | None, in_features: int, out_features: int, weights: Weights | Weight, bias: Weights | Weight | None = None, quantization_config: QuantizationConfig | None = None) → Linear

Factory method to create a Linear layer with appropriate implementation.

`weight`

weight*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray*

`LinearV2`

class max.pipelines.nn.linear.LinearV2(in_dim: int, out_dim: int, dtype: DType, device: DeviceRef, has_bias: bool = False, quantization_encoding: QuantizationEncoding | None = None, name: str | None = None)

Applies a linear transformation to incoming data: $y = xW^T + b$ .

This layer implements a fully connected layer where inputs are multiplied by a weight matrix and optionally added with a bias vector. Both weights and bias initially reside on CPU, and the model init phase moves them to device.

Example:

linear_layer = LinearV2(
    in_dim=256,
    out_dim=128,
    dtype=DType.float32,
    device=DeviceRef.GPU(),
    name="linear",
    has_bias=True
)

input_tensor: TensorValue
output = linear_layer(input_tensor)
linear_layer = LinearV2(
    in_dim=256,
    out_dim=128,
    dtype=DType.float32,
    device=DeviceRef.GPU(),
    name="linear",
    has_bias=True
)

input_tensor: TensorValue
output = linear_layer(input_tensor)

Initializes the linear layer with weights and optional bias.

Parameters:
- in_dim – The dimensionality of the input space.
- out_dim – The dimensionality of the output space.
- dtype – The data type for both weights and bias.
- device – The target device for computation. Weights remain on CPU until moved during computation.
- name – Base name for weights (appended with .weight and .bias if applicable).
- has_bias – When True, adds a bias vector to the layer. Defaults to False.

`bias`

bias*: Weight | None* = None

The optional bias vector stored on CPU with shape (out_dim,). Model init moves the bias to device if present.

`device`

device*: DeviceRef*

The device where matrix operations are performed.

`weight`

weight*: Weight*

The weight matrix stored on CPU with shape (out_dim, in_dim). Model init transposes the weight and moves it to device.

`MLP`

class max.pipelines.nn.linear.MLP(gate_proj: Linear, down_proj: Linear, up_proj: Linear)

Simple multi-layer perceptron composed of three linear layers. Uses SiLU activation function.

`down_proj`

down_proj*: Linear*

`gate_proj`

gate_proj*: Linear*

`up_proj`

up_proj*: Linear*

`MLPV2`

class max.pipelines.nn.linear.MLPV2(gate_proj: LinearV2, down_proj: LinearV2, up_proj: LinearV2)

Simple multi-layer perceptron composed of three linear layers. Uses SiLU activation function.

`down_proj`

down_proj*: LinearV2*

`gate_proj`

gate_proj*: LinearV2*

`up_proj`

up_proj*: LinearV2*

`QLinear`

class max.pipelines.nn.linear.QLinear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, quantization_encoding: QuantizationEncoding | None = None)

A quantized fully connected layer.

`quantization_encoding`

quantization_encoding*: QuantizationEncoding | None* = None

`linear_class()`

max.pipelines.nn.linear.linear_class(quantization_encoding: QuantizationEncoding | None) → type[max.pipelines.nn.linear.LinearV2]

Returns a Linear class to use that’s compatible with the quantization encoding.

DistributedMLP​

list_of_mlps​

num_devices​

GPTQLinear​

perm_idx​

quantization_config​

GPTQLinearV2​

Linear​

bias​

create()​

weight​

LinearV2​

bias​

device​

weight​

MLP​

down_proj​

gate_proj​

up_proj​

MLPV2​

down_proj​

gate_proj​

up_proj​

QLinear​

quantization_encoding​

linear_class()​

`DistributedMLP`

`list_of_mlps`

`num_devices`

`GPTQLinear`

`perm_idx`

`quantization_config`

`GPTQLinearV2`

`Linear`

`bias`

`create()`

`weight`

`LinearV2`

`bias`

`device`

`weight`

`MLP`

`down_proj`

`gate_proj`

`up_proj`

`MLPV2`

`down_proj`

`gate_proj`

`up_proj`

`QLinear`

`quantization_encoding`

`linear_class()`