Skip to main content

Mojo struct

WGMMADescriptor

@register_passable(trivial) struct WGMMADescriptor[dtype: DType]

Descriptor for shared memory operands used in warp group matrix multiply operations.

This struct represents a descriptor that encodes information about shared memory layout and access patterns for warp group matrix multiply operations. The descriptor contains the following bit fields:

Bit fieldSizeDescription
0-1314Base address in shared memory
16-2914LBO: leading dim byte offset
32-4514SBO: stride dim byte offset
49-513Matrix base offset, 0 for canonical layouts
62-632Swizzle mode:
  0: no swizzle,
  1: 128B swizzle,
  2: 64B swizzle,
  3: 32B swizzle

Note:

  • Some bits are unused.
  • Base address, LBO, and SBO ignore 4 least significant bits.

See: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-matrix-shared-memory-layout-matrix-descriptor

Parameters

  • dtype (DType): The data type of the shared memory operand. This affects memory alignment and access patterns for the descriptor.

Fields

  • desc (Int64): The 64-bit descriptor value that encodes shared memory layout information. This field stores the complete descriptor with all bit fields packed into a single 64-bit integer:

    • Bits 0-13: Base address in shared memory (14 bits)
    • Bits 16-29: Leading dimension stride in bytes (14 bits)
    • Bits 32-45: Stride dimension offset in bytes (14 bits)
    • Bits 49-51: Base offset (3 bits)
    • Bits 62-63: Swizzle mode for memory access pattern (2 bits)

    The descriptor is used by NVIDIA Hopper architecture's warp group matrix multiply instructions to efficiently access shared memory with the appropriate layout and access patterns.

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, MMAOperandDescriptor, Movable, UnknownDestructibility

Aliases

__copyinit__is_trivial

alias __copyinit__is_trivial = True

__del__is_trivial

alias __del__is_trivial = True

__moveinit__is_trivial

alias __moveinit__is_trivial = True

Methods

__init__

__init__(val: Int64) -> Self

Initialize descriptor with raw 64-bit value.

This constructor allows creating a descriptor directly from a 64-bit integer that already contains the properly formatted bit fields for the descriptor.

The implicit attribute enables automatic conversion from Int64 to WGMMADescriptor.

Args:

  • val (Int64): A 64-bit integer containing the complete descriptor bit layout.

__add__

__add__(self, offset: Int) -> Self

Add offset to descriptor's base address.

Args:

  • offset (Int): Byte offset to add to base address.

Returns:

Self: New descriptor with updated base address.

__iadd__

__iadd__(mut self, offset: Int)

Add offset to descriptor's base address in-place.

Args:

  • offset (Int): Byte offset to add to base address.

create

static create[stride_byte_offset: Int, leading_byte_offset: Int, swizzle_mode: TensorMapSwizzle = 0](smem_ptr: UnsafePointer[Scalar[dtype], address_space=AddressSpace(3), mut=mut, origin=origin]) -> Self

Create a descriptor for shared memory operand.

Parameters:

  • stride_byte_offset (Int): Stride dimension offset in bytes.
  • leading_byte_offset (Int): Leading dimension stride in bytes.
  • swizzle_mode (TensorMapSwizzle): Memory access pattern mode.

Args:

Returns:

Self: Initialized descriptor for the shared memory operand.

Was this page helpful?