Mojo struct

UMMAInsDescriptor

@register_passable(trivial) struct UMMAInsDescriptor[mma_kind: UMMAKind]

Descriptor for UMMA instructions.

This struct represents a descriptor that encodes information about UMMA instructions. The descriptor contains the following bit fields:

Sparsity (2 bits): The sparsity of the input matrices. Currently defaults to dense matrices.
Saturate for integer types (1 bits): Whether to saturate the result for integer types. Currently not supported.
Matrix D type (2 bits): Data type of matrix D.
Matrix A type (3 bits): Data type of matrix A.
Matrix B type (3 bits): Data type of matrix B.
Negate A matrix (1 bit): Whether to negate matrix A. Currently defaults to False.
Negate B matrix (1 bit): Whether to negate matrix B. Currently defaults to False.
Transpose A (1 bit): Whether to transpose matrix A.
Transpose B (1 bit): Whether to transpose matrix B.
N, Dimension of Matrix B (6 bits): Number of columns in matrix B. 3 LSBs are unused.
M, Dimension of Matrix A (6 bits): Number of rows in matrix A. 3 LSBs are unused.

See: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html?highlight=tcgen05%2520mma#tcgen05-instuction-desc-kind-tf32-f16-f8f6f4

Parameters

mma_kind (UMMAKind): The kind of UMMA instruction.

Fields

desc (UInt32): The 32-bit descriptor value that encodes UMMA instruction information. This field stores the complete descriptor with all bit fields packed into a single 32-bit integer:
- Bits 0-1: Sparsity selector(2 bits)
- Bits 2: Sparsity enable(1 bit)
- Bits 3: Saturate for integer types (1 bit)
- Bits 4-5: Matrix D type (2 bits)
- Bits 6: Reserved (1 bit)
- Bits 7-9: Matrix A type (3 bits)
- Bits 10-12: Matrix B type (3 bits)
- Bits 13: Negate A matrix (1 bit)
- Bits 14: Negate B matrix (1 bit)
- Bits 15: Transpose A (1 bit)
- Bits 16: Transpose B (1 bit)
- Bits 17-22: N, Dimension of Matrix B (6 bits)
- Bits 23: Reserved (1 bit)
- Bits 24-28: M, Dimension of Matrix A (5 bits)
- Bits 29: Reserved (1 bit)
- Bits 30-31: Maximum shift while attempting B matrix (2 bits)

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, Movable, UnknownDestructibility

Aliases

`copyinitis_trivial`

alias __copyinit__is_trivial = UInt32.__copyinit__is_trivial

`delis_trivial`

alias __del__is_trivial = UInt32.__del__is_trivial

`moveinitis_trivial`

alias __moveinit__is_trivial = UInt32.__moveinit__is_trivial

Methods

`init`

__init__(value: UInt32) -> Self

Initialize descriptor with raw 32-bit value.

This constructor allows creating a descriptor directly from a 32-bit integer that already contains the properly formatted bit fields for the descriptor.

Args:

value (UInt32): A 32-bit integer containing the complete descriptor bit layout.

`create`

static create[d_type: DType, a_type: DType, b_type: DType, output_shape: IndexList[2, element_type=DType.uint32], /, *, transpose_a: Bool = False, transpose_b: Bool = True]() -> Self

Create a descriptor for UMMA instructions.

This function creates a descriptor for UMMA instructions based on the provided parameters.

Parameters:

d_type (DType): The data type of matrix D.
a_type (DType): The data type of matrix A.
b_type (DType): The data type of matrix B.
output_shape (IndexList): The shape of the output matrix.
transpose_a (Bool): Whether to transpose matrix A.
transpose_b (Bool): Whether to transpose matrix B.

Returns:

Self: A 32-bit integer containing the complete descriptor bit layout.

Parameters​

Fields​

Implemented traits​

Aliases​

__copyinit__is_trivial​

__del__is_trivial​

__moveinit__is_trivial​

Methods​

__init__​

create​