Python module
linear
Multi-layer Perceptron.
DistributedMLP
class max.pipelines.nn.linear.DistributedMLP(list_of_mlps: 'list[MLP]', num_devices: 'int')
list_of_mlps
list_of_mlps*: list[max.pipelines.nn.linear.MLP]*
num_devices
num_devices*: int*
GPTQLinear
class max.pipelines.nn.linear.GPTQLinear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, quantization_encoding: QuantizationEncoding | None = None, quantization_config: QuantizationConfig | None = None, perm_idx: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None)
A Linear layer for GPTQ encoding
perm_idx
perm_idx*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None* = None
quantization_config
quantization_config*: QuantizationConfig | None* = None
GPTQLinearV2
class max.pipelines.nn.linear.GPTQLinearV2(in_dim: int, out_dim: int, dtype: DType, device: DeviceRef, has_bias: bool = False, quantization_encoding: QuantizationEncoding | None = None, quantization_config: QuantizationConfig | None = None)
A Linear layer for GPTQ encoding
Initializes the linear layer with weights and optional bias with GPTQ quantization.
-
Parameters:
- in_dim – The dimensionality of the input space.
- out_dim – The dimensionality of the output space.
- dtype – The data type for both weights and bias.
- device – The target device for computation. Weights remain on CPU until moved during computation.
- name – Base name for weights (appended with
.weight
and.bias
if applicable). - has_bias – When
True
, adds a bias vector to the layer. Defaults toFalse
.
Linear
class max.pipelines.nn.linear.Linear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None)
A unified linear layer that delegates to either regular or quantized implementation.
bias
bias*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None* = None
create()
classmethod create(dtype: DType, quantization_encoding: QuantizationEncoding | None, in_features: int, out_features: int, weights: Weights | Weight, bias: Weights | Weight | None = None, quantization_config: QuantizationConfig | None = None) → Linear
Factory method to create a Linear layer with appropriate implementation.
weight
weight*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray*
LinearV2
class max.pipelines.nn.linear.LinearV2(in_dim: int, out_dim: int, dtype: DType, device: DeviceRef, has_bias: bool = False, quantization_encoding: QuantizationEncoding | None = None, name: str | None = None)
Applies a linear transformation to incoming data: .
This layer implements a fully connected layer where inputs are multiplied
by a weight matrix and optionally added with a bias vector.
Both weights and bias initially reside on CPU, and the model init phase
moves them to device
.
Example:
linear_layer = LinearV2(
in_dim=256,
out_dim=128,
dtype=DType.float32,
device=DeviceRef.GPU(),
name="linear",
has_bias=True
)
input_tensor: TensorValue
output = linear_layer(input_tensor)
linear_layer = LinearV2(
in_dim=256,
out_dim=128,
dtype=DType.float32,
device=DeviceRef.GPU(),
name="linear",
has_bias=True
)
input_tensor: TensorValue
output = linear_layer(input_tensor)
Initializes the linear layer with weights and optional bias.
-
Parameters:
- in_dim – The dimensionality of the input space.
- out_dim – The dimensionality of the output space.
- dtype – The data type for both weights and bias.
- device – The target device for computation. Weights remain on CPU until moved during computation.
- name – Base name for weights (appended with
.weight
and.bias
if applicable). - has_bias – When
True
, adds a bias vector to the layer. Defaults toFalse
.
bias
The optional bias vector stored on CPU with shape (out_dim,).
Model init moves the bias to device
if present.
device
device*: DeviceRef*
The device where matrix operations are performed.
weight
weight*: Weight*
The weight matrix stored on CPU with shape (out_dim, in_dim).
Model init transposes the weight and moves it to device
.
MLP
class max.pipelines.nn.linear.MLP(gate_proj: Linear, down_proj: Linear, up_proj: Linear)
Simple multi-layer perceptron composed of three linear layers. Uses SiLU activation function.
down_proj
down_proj*: Linear*
gate_proj
gate_proj*: Linear*
up_proj
up_proj*: Linear*
MLPV2
class max.pipelines.nn.linear.MLPV2(gate_proj: LinearV2, down_proj: LinearV2, up_proj: LinearV2)
Simple multi-layer perceptron composed of three linear layers. Uses SiLU activation function.
down_proj
down_proj*: LinearV2*
gate_proj
gate_proj*: LinearV2*
up_proj
up_proj*: LinearV2*
QLinear
class max.pipelines.nn.linear.QLinear(weight: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray, bias: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, quantization_encoding: QuantizationEncoding | None = None)
A quantized fully connected layer.
quantization_encoding
quantization_encoding*: QuantizationEncoding | None* = None
linear_class()
max.pipelines.nn.linear.linear_class(quantization_encoding: QuantizationEncoding | None) → type[max.pipelines.nn.linear.LinearV2]
Returns a Linear class to use that’s compatible with the quantization encoding.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!