Mojo function
async_copy
async_copy[type: DType, //, size: Int, *, fill: OptionalReg[SIMD[type, 1]] = OptionalReg[SIMD[type, 1]]({:i1 0, 1}), bypass_L1_16B: Bool = True, l2_prefetch: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1}), eviction_policy: CacheEviction = CacheEviction(0)](src: UnsafePointer[SIMD[type, 1], address_space=AddressSpace(1)], dst: UnsafePointer[SIMD[type, 1], address_space=AddressSpace(3)], src_size: SIMD[int32, 1] = __init__[__mlir_type.!pop.int_literal](0), predicate: Bool = False)
Asynchronously copies data from global memory to shared memory.
This function provides a high-performance asynchronous memory copy operation with configurable caching behavior, prefetching, and fill values. It maps directly to the PTX cp.async instruction on NVIDIA GPUs.
Constraints:
- Fill value only supported for types <= 32 bits. - Size must be 4, 8, or 16 bytes. - Cannot enable both L2 prefetch and L1 bypass. - L2 prefetch size must be 64, 128, or 256 bytes.
Parameters:
- type (
DType
): The data type to copy (e.g. float32, int32). - size (
Int
): Number of bytes to copy (must be 4, 8, or 16). - fill (
OptionalReg[SIMD[type, 1]]
): Optional fill value for uncopied bytes when src_size < size. - bypass_L1_16B (
Bool
): If True, bypasses L1 cache for 16-byte copies. - l2_prefetch (
OptionalReg[Int]
): Optional L2 prefetch size (64, 128, or 256 bytes). - eviction_policy (
CacheEviction
): Cache eviction policy for the copy operation.
Args:
- src (
UnsafePointer[SIMD[type, 1], address_space=AddressSpace(1)]
): Source pointer in global memory. - dst (
UnsafePointer[SIMD[type, 1], address_space=AddressSpace(3)]
): Destination pointer in shared memory. - src_size (
SIMD[int32, 1]
): Actual bytes to copy from src (remaining bytes use fill value). - predicate (
Bool
): Optional predicate to conditionally execute the copy.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!