Mojo function

qmatmul

qmatmul[encoding: QuantizationEncoding](lhs: Symbol, rhs: Symbol) -> Symbol

Performs matrix multiplication between floating point and quantized tensors.

This quantizes the lhs floating point value to match the encoding of the rhs quantized value, performs matmul, and then dequantizes the result. The operation expects a transposed rhs argument, which differs from conventional matrix multiplication.

For matrix shapes:

Where standard matmul() expects shapes ($m x $n) @ ($n x $p) → ($m x $p)
qmatmul() expects shapes ($m x $n) @ ($p x $n) → ($m x $p)

For example, given:

lhs shape: [32, 64]
rhs shape: [32, 64] (transposed)
output shape: [32, 32]

The operation can be expressed as:

dequantize(quantize(lhs) . transpose(rhs))

dequantize(quantize(lhs) . transpose(rhs))

Where . is a normal matmul operator.

The last two dimensions in lhs are treated as matrices and multiplied by rhs (which must be a 2D tensor). Any remaining dimensions in lhs are broadcast dimensions.

NOTE: Currently this supports Q4_0, Q4_K, and Q6_K encodings only.

Parameters:

encoding (QuantizationEncoding): The quantization encoding to use.

Args:

lhs (Symbol): The non-quantized, left-hand-side of the matmul.
rhs (Symbol): The transposed and quantized right-hand-side of the matmul. Must be rank 2 (a 2D tensor/matrix) and in a supported quantization encoding.

Returns:

The dequantized result (a floating point tensor).