Mojo module
layout_tensor
Aliases
-
binary_op_type = fn[DType, Int](lhs: SIMD[$0, $1], rhs: SIMD[$0, $1]) -> SIMD[$0, $1]
: Type alias for binary operations on SIMD vectors. This type represents a function that takes two SIMD vectors of the same type and width and returns a SIMD vector of the same type and width.Args: type: The data type of the SIMD vector elements. width: The width of the SIMD vector. lhs: Left-hand side SIMD vector operand. rhs: Right-hand side SIMD vector operand.
Returns: A SIMD vector containing the result of the binary operation.
Structs
-
LayoutTensor
: A high-performance tensor with explicit memory layout and hardware-optimized access patterns. -
LayoutTensorIter
: Iterator for traversing a memory buffer with a specific layout. -
ThreadScope
: Represents the scope of thread operations in GPU programming.
Functions
-
copy_dram_to_local
: Used to copy data from DRAM to registers for AMD GPUs. It uses buffer_load intrinsic to load data and can check for bounds. In addition to dst and src, it takes src_base as an argument to construct the buffer descriptor of the src tensor. src_base is the original global memory tensor from which src is derived. -
copy_dram_to_sram
: Synchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context. -
copy_dram_to_sram_async
: Asynchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context. -
copy_local_to_dram
: -
copy_local_to_local
: Synchronously copy data between local memory (register) tensors with type conversion. -
copy_local_to_sram
: Synchronously copy data from local memory (registers) to SRAM (shared memory). -
copy_sram_to_dram
: Synchronously copy data from SRAM (shared memory) to DRAM (global memory). -
copy_sram_to_local
: Synchronously copy data from SRAM (shared memory) to local memory. -
cp_async_k_major
: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with K-major layout. -
cp_async_mn_major
: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with MN-major layout. -
stack_allocation_like
: Create a stack-allocated tensor with the same layout as an existing tensor.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!