Mojo struct
ScatterGatherAmd
struct ScatterGatherAmd[thread_layout: Layout, num_threads: Int = thread_layout.size(), thread_scope: ThreadScope = ThreadScope(0), block_dim_count: Int = 1]
Tile-based AMD data movement delegate for scatter-gather operations.
This struct facilitates data movement between DRAM and registers on AMD GPUs using tile-based operations.
Parameters
- thread_layout (
Layout
): The layout defining thread organization. - num_threads (
Int
): Total number of threads (defaults to thread_layout size). - thread_scope (
ThreadScope
): The scope of thread execution (block or warp). - block_dim_count (
Int
): Number of block dimensions.
Fields
- buffer (
AMDBufferResource
):
Implemented traits
AnyType
,
UnknownDestructibility
Aliases
__del__is_trivial
alias __del__is_trivial = True
Methods
__init__
__init__(out self, tensor: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment])
Initialize the scatter-gather delegate with a tensor.
Args:
- tensor (
LayoutTensor
): The layout tensor to create an AMD buffer resource from.
copy
copy(self, dst_reg_tile: LayoutTensor[dtype, layout, origin, address_space=AddressSpace(5), element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], src_gmem_tile: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], src_tensor: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], offset: OptionalReg[UInt] = None)
Copy data from DRAM to registers (local memory).
Args:
- dst_reg_tile (
LayoutTensor
): Destination register tile in local address space. - src_gmem_tile (
LayoutTensor
): Source global memory tile. - src_tensor (
LayoutTensor
): Source tensor for the copy operation. - offset (
OptionalReg
): Optional offset for the copy operation.
copy(self, dst_gmem_tile: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], src_reg_tile: LayoutTensor[dtype, layout, origin, address_space=AddressSpace(5), element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment])
Copy data from registers (local memory) to DRAM.
Args:
- dst_gmem_tile (
LayoutTensor
): Destination global memory tile. - src_reg_tile (
LayoutTensor
): Source register tile in local address space.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!