Mojo function
create_nested_tma_tile
create_nested_tma_tile[dtype: DType, //, tile_m: Int, tile_n: Int, swizzle_mode: TensorMapSwizzle, *, is_k_major: Bool](ctx: DeviceContext, tensor: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], out res: TMATensorTile[dtype, tile_layout_k_major[dtype, tile_m, tile_n, swizzle_mode]() if is_k_major else tile_layout_mn_major[dtype, tile_n, tile_m, swizzle_mode](), _tma_desc_tile_layout[dtype, 2, IndexList[2, DType.int64](tile_m, tile_n, Tuple[]()), is_k_major, swizzle_mode](), is_k_major])
Creates a rank 2 TMATensorTile
with a nested layout using tile_layout_k_major
is is_k_major
or tile_layout_mn_major
otherwise.
Parameters:
- dtype (
DType
): DType The data type of the tensor elements. - tile_m (
Int
): The number of rows of a global memory tile. - tile_n (
Int
): The number of columns of a global memory tile. - swizzle_mode (
TensorMapSwizzle
): The swizzle_mode used by the TMA operation. - is_k_major (
Bool
): Whether the shared memory is to be k-major or mn-major. If mn-major, it is transposed.
Args:
- ctx (
DeviceContext
): DeviceContext The CUDA device context used to create the TMA descriptor. - tensor (
LayoutTensor
): LayoutTensor[type, *, **] The source tensor from which data will be transferred. This defines the global memory layout and must match the specified data type.
Returns:
TMATensorTile
: The TMATensorTile
configured with the specified tile dimensions and
swizzle mode, ready for use in asynchronous data transfer operations.
Raises:
If there was an error creating the underlying TMADescriptor.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!