Mojo struct
Q4sym
struct Q4sym[group_size: Int, float_dtype: DType = float32]
Q4sym: compresses values of type float_dtype
to 4bit unsigned integers which have been dynamically symmetrically quantized with the given scale factor.
group_size
determines the number of elements which share quantization
parameters.
We store things in a strided fashion: Example:
Assume group_size = 8
and we want to process uint4 numbers:
A, B, C, D, E, F, G, H which have associated bits aaaa, bbbb, cccc, ....
eeeeaaaa|ffffbbbb|ggggcccc|hhhhdddd
To uncompress to floating point, take the decoded uint4 value, subtract the implicit zero-point of 2^4=8, and multiply by the scale factor.
Parameters
- group_size (
Int
): The number of encoded numbers stored in this struct. - float_dtype (
DType
): The floating point dtype this struct works with.
Fields
- scale (
StaticTuple[SIMD[uint8, 1], 2]
): The FP16 scale of the group, stored as individual bytes. - bits (
StaticTuple[SIMD[uint8, 1], (group_size // 2)]
): The bits of the encoded uint4 numbers.
Implemented traits
AnyType
,
UnknownDestructibility
Methods
__init__
__init__(out self)
Construct a default initialized Q4sym.
@implicit
__init__(out self, data: SIMD[float_dtype, group_size])
Construct an encoded Q4sym from data.
Args:
- data (
SIMD[float_dtype, group_size]
): The floating point data to encode and store.
decode_scale
decode_scale(mut self) -> SIMD[float16, 1]
Obtain the scale factor.
Returns:
The decoded scale factor.
decode_unsigned
decode_unsigned(mut self) -> SIMD[uint8, group_size]
Decode the stored uint4 numbers to uint8.
Returns:
The decoded stored numbers as uint8 numbers. These have an implicit zero-point of 8.
decode_signed
decode_signed(mut self) -> SIMD[int8, group_size]
Decode the stored uint4 numbers to requantized int4 numbers.
This is done by simply subtracting an implicit zp of 8 from the unsigned decoding.
Returns:
The decoded stored numbers as int8 numbers. These have a zero-point of 0.
decode_fully
decode_fully(mut self) -> SIMD[float_dtype, group_size]
Decode the stored numbers into floating point representation.
Returns:
The decoded numbers.
quantize_and_write_to_tensor
static quantize_and_write_to_tensor[rank: Int](input_tensor: NDBuffer[float_dtype, rank, origin], output_tensor: NDBuffer[uint8, rank, origin], input_shape: Index[rank])
Encodes the floating point numbers in input_tensor
along the inner-most dimension and writes the result to output_tensor.
Parameters:
- rank (
Int
): The rank of the input and output tensors.
Args:
- input_tensor (
NDBuffer[float_dtype, rank, origin]
): The input tensor we are encoding. - output_tensor (
NDBuffer[uint8, rank, origin]
): The output tensor containing the encoded input. The shape of the output should be the same as the input except along the inner dimension where if the original inner dimension wasd
, the corresponding output dimension should be: ceil(d
/ group_size) * sizeof(self). - input_shape (
Index[rank]
): The shape of the input tensor.
dequantize_and_write_to_tensor
static dequantize_and_write_to_tensor[rank: Int, //](input_tensor: NDBuffer[uint8, rank, origin], output_tensor: NDBuffer[float_dtype, rank, origin], output_shape: Index[rank])
Encodes the floating point numbers in input_tensor
along the inner-most dimension and writes the result to output_tensor.
Parameters:
- rank (
Int
): The rank of the input and output tensors.
Args:
- input_tensor (
NDBuffer[uint8, rank, origin]
): The input tensor we are decoding. - output_tensor (
NDBuffer[float_dtype, rank, origin]
): The output tensor containing the decoded input. - output_shape (
Index[rank]
): The shape of the output tensor.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!