Mojo struct
Float32Encoding
The float32 quantization encoding.
This encoding is essentially an identity operation. It exists in order to be a default case for code that is generic over quantization encoding.
Implemented traits
AnyType
,
QuantizationEncoding
Methods
quantize
static quantize(_tensor: Tensor[float32]) -> Tensor[uint8]
Unimplemented quantize method for float32.
Since float32 is an identity encoding, it shouldn't define a quantize method. In particular, float32 values should be used with non-quantized ops, which expect dtype float32. This is in contrast to quantized ops, which expect dtype uint8 operands. So raise an exception here to avoid accidental bugs.
id
static id() -> String
Identifier for the float32 quantized encoding.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!