Mojo module

layout_tensor

Provides the LayoutTensor type for representing multidimensional data.

Aliases

binary_op_type = fn[DType, Int](lhs: SIMD[$0, $1], rhs: SIMD[$0, $1]) -> SIMD[$0, $1]: Type alias for binary operations on SIMD vectors. This type represents a function that takes two SIMD vectors of the same type and width and returns a SIMD vector of the same type and width.

Args: type: The data type of the SIMD vector elements. width: The width of the SIMD vector. lhs: Left-hand side SIMD vector operand. rhs: Right-hand side SIMD vector operand.

Returns: A SIMD vector containing the result of the binary operation.

LayoutTensor: A high-performance tensor with explicit memory layout and hardware-optimized access patterns.
LayoutTensorIter: Iterator for traversing a memory buffer with a specific layout.
ThreadScope: Represents the scope of thread operations in GPU programming.

copy: Synchronously copy data from local memory (registers) to SRAM (shared memory).
copy_dram_to_local: Efficiently copy data from global memory (DRAM) to registers for AMD GPUs.
copy_dram_to_sram: Synchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context.
copy_dram_to_sram_async: Asynchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context.
copy_local_to_dram: Efficiently copy data from registers (LOCAL) to global memory (DRAM).
copy_local_to_local: Synchronously copy data between local memory (register) tensors with type conversion.
copy_sram_to_dram: Synchronously copy data from SRAM (shared memory) to DRAM (global memory).
copy_sram_to_local: Synchronously copy data from SRAM (shared memory) to local memory.
cp_async_k_major: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with K-major layout.
cp_async_mn_major: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with MN-major layout.
stack_allocation_like: Create a stack-allocated tensor with the same layout as an existing tensor.