Skip to main content
Log in

Mojo module

layout_tensor

Aliases

  • binary_op_type = fn[DType, Int](lhs: SIMD[$0, $1], rhs: SIMD[$0, $1]) -> SIMD[$0, $1]: Type alias for binary operations on SIMD vectors. This type represents a function that takes two SIMD vectors of the same type and width and returns a SIMD vector of the same type and width.

    Args: type: The data type of the SIMD vector elements. width: The width of the SIMD vector. lhs: Left-hand side SIMD vector operand. rhs: Right-hand side SIMD vector operand.

    Returns: A SIMD vector containing the result of the binary operation.

Structs

  • LayoutTensor: A high-performance tensor with explicit memory layout and hardware-optimized access patterns.
  • LayoutTensorIter: Iterator for traversing a memory buffer with a specific layout.
  • ThreadScope: Represents the scope of thread operations in GPU programming.

Functions

  • copy_dram_to_local: Used to copy data from DRAM to registers for AMD GPUs. It uses buffer_load intrinsic to load data and can check for bounds. In addition to dst and src, it takes src_base as an argument to construct the buffer descriptor of the src tensor. src_base is the original global memory tensor from which src is derived.
  • copy_dram_to_sram: Synchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context.
  • copy_dram_to_sram_async: Asynchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context.
  • copy_local_to_dram:
  • copy_local_to_local: Synchronously copy data between local memory (register) tensors with type conversion.
  • copy_local_to_sram: Synchronously copy data from local memory (registers) to SRAM (shared memory).
  • copy_sram_to_dram: Synchronously copy data from SRAM (shared memory) to DRAM (global memory).
  • copy_sram_to_local: Synchronously copy data from SRAM (shared memory) to local memory.
  • cp_async_k_major: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with K-major layout.
  • cp_async_mn_major: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with MN-major layout.
  • stack_allocation_like: Create a stack-allocated tensor with the same layout as an existing tensor.