Skip to main content
Log in

Python module

linear

Multi-layer Perceptron.

DistributedMLP

class max.pipelines.nn.linear.DistributedMLP(list_of_mlps: 'list[MLP]', num_devices: 'int')

list_of_mlps

list_of_mlps*: list[max.pipelines.nn.linear.MLP]*

num_devices

num_devices*: int*

GPTQLinear

class max.pipelines.nn.linear.GPTQLinear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, quantization_encoding: QuantizationEncoding | None = None, quantization_config: QuantizationConfig | None = None, perm_idx: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None)

A Linear layer for GPTQ encoding

perm_idx

perm_idx*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None* = None

quantization_config

quantization_config*: QuantizationConfig | None* = None

GPTQLinearV2

class max.pipelines.nn.linear.GPTQLinearV2(in_dim: int, out_dim: int, dtype: DType, device: DeviceRef, has_bias: bool = False, quantization_encoding: QuantizationEncoding | None = None, quantization_config: QuantizationConfig | None = None)

A Linear layer for GPTQ encoding

Initializes the linear layer with weights and optional bias with GPTQ quantization.

  • Parameters:

    • in_dim – The dimensionality of the input space.
    • out_dim – The dimensionality of the output space.
    • dtype – The data type for both weights and bias.
    • device – The target device for computation. Weights remain on CPU until moved during computation.
    • name – Base name for weights (appended with .weight and .bias if applicable).
    • has_bias – When True, adds a bias vector to the layer. Defaults to False.

Linear

class max.pipelines.nn.linear.Linear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None)

A unified linear layer that delegates to either regular or quantized implementation.

bias

bias*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None* = None

create()

classmethod create(dtype: DType, quantization_encoding: QuantizationEncoding | None, in_features: int, out_features: int, weights: Weights | Weight, bias: Weights | Weight | None = None, quantization_config: QuantizationConfig | None = None) → Linear

Factory method to create a Linear layer with appropriate implementation.

weight

weight*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray*

LinearV2

class max.pipelines.nn.linear.LinearV2(in_dim: int, out_dim: int, dtype: DType, device: DeviceRef, has_bias: bool = False, quantization_encoding: QuantizationEncoding | None = None, name: str | None = None)

Applies a linear transformation to incoming data: y=xWT+by = xW^T + b.

This layer implements a fully connected layer where inputs are multiplied by a weight matrix and optionally added with a bias vector. Both weights and bias initially reside on CPU, and the model init phase moves them to device.

Example:

linear_layer = LinearV2(
in_dim=256,
out_dim=128,
dtype=DType.float32,
device=DeviceRef.GPU(),
name="linear",
has_bias=True
)

input_tensor: TensorValue
output = linear_layer(input_tensor)
linear_layer = LinearV2(
in_dim=256,
out_dim=128,
dtype=DType.float32,
device=DeviceRef.GPU(),
name="linear",
has_bias=True
)

input_tensor: TensorValue
output = linear_layer(input_tensor)

Initializes the linear layer with weights and optional bias.

  • Parameters:

    • in_dim – The dimensionality of the input space.
    • out_dim – The dimensionality of the output space.
    • dtype – The data type for both weights and bias.
    • device – The target device for computation. Weights remain on CPU until moved during computation.
    • name – Base name for weights (appended with .weight and .bias if applicable).
    • has_bias – When True, adds a bias vector to the layer. Defaults to False.

bias

bias*: Weight | None* = None

The optional bias vector stored on CPU with shape (out_dim,). Model init moves the bias to device if present.

device

device*: DeviceRef*

The device where matrix operations are performed.

weight

weight*: Weight*

The weight matrix stored on CPU with shape (out_dim, in_dim). Model init transposes the weight and moves it to device.

MLP

class max.pipelines.nn.linear.MLP(gate_proj: Linear, down_proj: Linear, up_proj: Linear)

Simple multi-layer perceptron composed of three linear layers. Uses SiLU activation function.

down_proj

down_proj*: Linear*

gate_proj

gate_proj*: Linear*

up_proj

up_proj*: Linear*

MLPV2

class max.pipelines.nn.linear.MLPV2(gate_proj: LinearV2, down_proj: LinearV2, up_proj: LinearV2)

Simple multi-layer perceptron composed of three linear layers. Uses SiLU activation function.

down_proj

down_proj*: LinearV2*

gate_proj

gate_proj*: LinearV2*

up_proj

up_proj*: LinearV2*

QLinear

class max.pipelines.nn.linear.QLinear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, quantization_encoding: QuantizationEncoding | None = None)

A quantized fully connected layer.

quantization_encoding

quantization_encoding*: QuantizationEncoding | None* = None

linear_class()

max.pipelines.nn.linear.linear_class(quantization_encoding: QuantizationEncoding | None) → type[max.pipelines.nn.linear.LinearV2]

Returns a Linear class to use that’s compatible with the quantization encoding.