Mojo function

create_nested_tma_tile

create_nested_tma_tile[dtype: DType, //, tile_m: Int, tile_n: Int, swizzle_mode: TensorMapSwizzle, *, is_k_major: Bool](ctx: DeviceContext, tensor: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], out res: TMATensorTile[dtype, tile_layout_k_major[dtype, tile_m, tile_n, swizzle_mode]() if is_k_major else tile_layout_mn_major[dtype, tile_n, tile_m, swizzle_mode](), _tma_desc_tile_layout[dtype, 2, IndexList[2, DType.int64](tile_m, tile_n, Tuple[]()), is_k_major, swizzle_mode](), is_k_major])

Creates a rank 2 TMATensorTile with a nested layout using tile_layout_k_major is is_k_major or tile_layout_mn_major otherwise.

Parameters:

dtype (DType): DType The data type of the tensor elements.
tile_m (Int): The number of rows of a global memory tile.
tile_n (Int): The number of columns of a global memory tile.
swizzle_mode (TensorMapSwizzle): The swizzle_mode used by the TMA operation.
is_k_major (Bool): Whether the shared memory is to be k-major or mn-major. If mn-major, it is transposed.

Args:

ctx (DeviceContext): DeviceContext The CUDA device context used to create the TMA descriptor.
tensor (LayoutTensor): LayoutTensor[type, *, **] The source tensor from which data will be transferred. This defines the global memory layout and must match the specified data type.

Returns:

TMATensorTile: The TMATensorTile configured with the specified tile dimensions and swizzle mode, ready for use in asynchronous data transfer operations.

Raises:

If there was an error creating the underlying TMADescriptor.