Skip to main content
Log in

Mojo struct

Q4sym

struct Q4sym[group_size: Int, float_dtype: DType = float32]

Q4sym: compresses values of type float_dtype to 4bit unsigned integers which have been dynamically symmetrically quantized with the given scale factor.

group_size determines the number of elements which share quantization parameters.

We store things in a strided fashion: Example:

Assume group_size = 8 and we want to process uint4 numbers: A, B, C, D, E, F, G, H which have associated bits aaaa, bbbb, cccc, ....

eeeeaaaa|ffffbbbb|ggggcccc|hhhhdddd

To uncompress to floating point, take the decoded uint4 value, subtract the implicit zero-point of 2^4=8, and multiply by the scale factor.

Parameters

  • group_size (Int): The number of encoded numbers stored in this struct.
  • float_dtype (DType): The floating point dtype this struct works with.

Fields

  • scale (StaticTuple[SIMD[uint8, 1], 2]): The FP16 scale of the group, stored as individual bytes.
  • bits (StaticTuple[SIMD[uint8, 1], (group_size // 2)]): The bits of the encoded uint4 numbers.

Implemented traits

AnyType, UnknownDestructibility

Methods

__init__

__init__(out self)

Construct a default initialized Q4sym.

@implicit __init__(out self, data: SIMD[float_dtype, group_size])

Construct an encoded Q4sym from data.

Args:

  • data (SIMD[float_dtype, group_size]): The floating point data to encode and store.

decode_scale

decode_scale(mut self) -> SIMD[float16, 1]

Obtain the scale factor.

Returns:

The decoded scale factor.

decode_unsigned

decode_unsigned(mut self) -> SIMD[uint8, group_size]

Decode the stored uint4 numbers to uint8.

Returns:

The decoded stored numbers as uint8 numbers. These have an implicit zero-point of 8.

decode_signed

decode_signed(mut self) -> SIMD[int8, group_size]

Decode the stored uint4 numbers to requantized int4 numbers.

This is done by simply subtracting an implicit zp of 8 from the unsigned decoding.

Returns:

The decoded stored numbers as int8 numbers. These have a zero-point of 0.

decode_fully

decode_fully(mut self) -> SIMD[float_dtype, group_size]

Decode the stored numbers into floating point representation.

Returns:

The decoded numbers.

quantize_and_write_to_tensor

static quantize_and_write_to_tensor[rank: Int](input_tensor: NDBuffer[float_dtype, rank, origin], output_tensor: NDBuffer[uint8, rank, origin], input_shape: Index[rank])

Encodes the floating point numbers in input_tensor along the inner-most dimension and writes the result to output_tensor.

Parameters:

  • rank (Int): The rank of the input and output tensors.

Args:

  • input_tensor (NDBuffer[float_dtype, rank, origin]): The input tensor we are encoding.
  • output_tensor (NDBuffer[uint8, rank, origin]): The output tensor containing the encoded input. The shape of the output should be the same as the input except along the inner dimension where if the original inner dimension was d, the corresponding output dimension should be: ceil(d / group_size) * sizeof(self).
  • input_shape (Index[rank]): The shape of the input tensor.

dequantize_and_write_to_tensor

static dequantize_and_write_to_tensor[rank: Int, //](input_tensor: NDBuffer[uint8, rank, origin], output_tensor: NDBuffer[float_dtype, rank, origin], output_shape: Index[rank])

Encodes the floating point numbers in input_tensor along the inner-most dimension and writes the result to output_tensor.

Parameters:

  • rank (Int): The rank of the input and output tensors.

Args:

  • input_tensor (NDBuffer[uint8, rank, origin]): The input tensor we are decoding.
  • output_tensor (NDBuffer[float_dtype, rank, origin]): The output tensor containing the decoded input.
  • output_shape (Index[rank]): The shape of the output tensor.