Mojo function
copy_sram_to_local
copy_sram_to_local[src_warp_layout: Layout, axis: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1})](dst: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_bitwidth=layout_bitwidth, masked=masked, alignment=alignment], src: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_bitwidth=layout_bitwidth, masked=masked, alignment=alignment])
Synchronously copy data from SRAM (shared memory) to local memory.
This function performs a synchronous memory transfer from SRAM (shared memory) to local memory (registers) using the specified thread layout for workload distribution.
Example:
```mojo
from layout import LayoutTensor, Layout
var shared_data = LayoutTensor[DType.float32, Layout((32, 32)),
address_space=AddressSpace.SHARED]()
var local_data = LayoutTensor[DType.float32, Layout((4, 4)),
address_space=AddressSpace.LOCAL]()
# Copy data using a thread layout with 8 threads
copy_sram_to_local[Layout(8)](local_data, shared_data)
```
```mojo
from layout import LayoutTensor, Layout
var shared_data = LayoutTensor[DType.float32, Layout((32, 32)),
address_space=AddressSpace.SHARED]()
var local_data = LayoutTensor[DType.float32, Layout((4, 4)),
address_space=AddressSpace.LOCAL]()
# Copy data using a thread layout with 8 threads
copy_sram_to_local[Layout(8)](local_data, shared_data)
```
Performance:
- Distributes the copy workload across multiple threads for parallel execution.
- Optimized for transferring data from shared memory to registers.
- Supports optional axis-specific distribution for specialized access patterns.
- Distributes the copy workload across multiple threads for parallel execution.
- Optimized for transferring data from shared memory to registers.
- Supports optional axis-specific distribution for specialized access patterns.
Constraints:
- The source tensor must be in SHARED address space (SRAM). - The destination tensor must be in LOCAL address space (registers). - Both tensors must have the same data type.
Parameters:
- src_warp_layout (
Layout
): Layout defining how threads are organized for the source tensor. This determines how the workload is distributed among threads. - axis (
OptionalReg[Int]
): Optional parameter specifying which axis to distribute along. When provided, distribution happens along the specified axis. When None (default), distribution uses the standard layout pattern.
Args:
- dst (
LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_bitwidth=layout_bitwidth, masked=masked, alignment=alignment]
): The destination tensor, which must be in local memory (registers). - src (
LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_bitwidth=layout_bitwidth, masked=masked, alignment=alignment]
): The source tensor, which must be in shared memory (SRAM).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!