Skip to main content

Mojo function

qmatmul

qmatmul[encoding: QuantizationEncoding](lhs: Symbol, rhs: Symbol) -> Symbol

Performs matrix multiplication between floating point and quantized tensors.

This quantizes the lhs floating point value to match the encoding of the rhs quantized value, performs matmul, and then dequantizes the result. Beware that, compared to a regular matmul op, this one expects the rhs value to be transposed. For example, if the lhs shape is [32, 64], and the quantized rhs shape is also [32, 64], then the output shape is [32, 32]

That is, where . is a normal matmul operator, this function returns the result from:

dequantize(quantize(lhs) . transpose(rhs))
dequantize(quantize(lhs) . transpose(rhs))

The last two dimensions in lhs are treated as matrices and multiplied by rhs (which must be a 2D tensor). Any remaining dimensions in lhs are broadcast dimensions.

NOTE: Currently this supports Q4_0, Q4_K, and Q6_K encodings only.

Parameters:

  • encoding (QuantizationEncoding): The quantization encoding to use.

Args:

  • lhs (Symbol): The non-quantized, left-hand-side of the matmul.
  • rhs (Symbol): The transposed and quantized right-hand-side of the matmul. Must be rank 2 (a 2D tensor/matrix) and in a supported quantization encoding.

Returns:

The dequantized result (a floating point tensor).