Mojo function
qmatmul
qmatmul[encoding: QuantizationEncoding](lhs: Symbol, rhs: Symbol) -> Symbol
Performs matrix multiplication between floating point and quantized tensors.
This quantizes the lhs
floating point value to match the encoding of the
rhs
quantized value, performs matmul, and then dequantizes the result.
The operation expects a transposed rhs
argument, which differs from
conventional matrix multiplication.
For matrix shapes:
- Where standard
matmul()
expects shapes($m x $n) @ ($n x $p) → ($m x $p)
qmatmul()
expects shapes($m x $n) @ ($p x $n) → ($m x $p)
For example, given:
- lhs shape: [32, 64]
- rhs shape: [32, 64] (transposed)
- output shape: [32, 32]
The operation can be expressed as:
dequantize(quantize(lhs) . transpose(rhs))
dequantize(quantize(lhs) . transpose(rhs))
Where .
is a normal matmul operator.
The last two dimensions in lhs
are treated as matrices and multiplied
by rhs
(which must be a 2D tensor). Any remaining dimensions in lhs
are broadcast dimensions.
NOTE: Currently this supports Q4_0, Q4_K, and Q6_K encodings only.
Parameters: