Skip to main content
Log in

Mojo function

copy_sram_to_local

copy_sram_to_local[src_warp_layout: Layout, axis: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1})](dst: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_bitwidth=layout_bitwidth, masked=masked, alignment=alignment], src: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_bitwidth=layout_bitwidth, masked=masked, alignment=alignment])

Synchronously copy data from SRAM (shared memory) to local memory.

This function performs a synchronous memory transfer from SRAM (shared memory) to local memory (registers) using the specified thread layout for workload distribution.

Example:

```mojo
from layout import LayoutTensor, Layout
var shared_data = LayoutTensor[DType.float32, Layout((32, 32)),
address_space=AddressSpace.SHARED]()
var local_data = LayoutTensor[DType.float32, Layout((4, 4)),
address_space=AddressSpace.LOCAL]()

# Copy data using a thread layout with 8 threads
copy_sram_to_local[Layout(8)](local_data, shared_data)
```
```mojo
from layout import LayoutTensor, Layout
var shared_data = LayoutTensor[DType.float32, Layout((32, 32)),
address_space=AddressSpace.SHARED]()
var local_data = LayoutTensor[DType.float32, Layout((4, 4)),
address_space=AddressSpace.LOCAL]()

# Copy data using a thread layout with 8 threads
copy_sram_to_local[Layout(8)](local_data, shared_data)
```

Performance:

- Distributes the copy workload across multiple threads for parallel execution.
- Optimized for transferring data from shared memory to registers.
- Supports optional axis-specific distribution for specialized access patterns.
- Distributes the copy workload across multiple threads for parallel execution.
- Optimized for transferring data from shared memory to registers.
- Supports optional axis-specific distribution for specialized access patterns.

Constraints:

  • The source tensor must be in SHARED address space (SRAM). - The destination tensor must be in LOCAL address space (registers). - Both tensors must have the same data type.

Parameters:

  • src_warp_layout (Layout): Layout defining how threads are organized for the source tensor. This determines how the workload is distributed among threads.
  • axis (OptionalReg[Int]): Optional parameter specifying which axis to distribute along. When provided, distribution happens along the specified axis. When None (default), distribution uses the standard layout pattern.

Args:

  • dst (LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_bitwidth=layout_bitwidth, masked=masked, alignment=alignment]): The destination tensor, which must be in local memory (registers).
  • src (LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_bitwidth=layout_bitwidth, masked=masked, alignment=alignment]): The source tensor, which must be in shared memory (SRAM).