Skip to main content
Log in

Mojo function

async_copy

async_copy[type: DType, //, size: Int, *, fill: OptionalReg[SIMD[type, 1]] = OptionalReg[SIMD[type, 1]]({:i1 0, 1}), bypass_L1_16B: Bool = True, l2_prefetch: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1}), eviction_policy: CacheEviction = CacheEviction(0)](src: UnsafePointer[SIMD[type, 1], address_space=AddressSpace(1)], dst: UnsafePointer[SIMD[type, 1], address_space=AddressSpace(3)], src_size: SIMD[int32, 1] = __init__[__mlir_type.!pop.int_literal](0), predicate: Bool = False)

Asynchronously copies data from global memory to shared memory.

This function provides a high-performance asynchronous memory copy operation with configurable caching behavior, prefetching, and fill values. It maps directly to the PTX cp.async instruction on NVIDIA GPUs.

Constraints:

  • Fill value only supported for types <= 32 bits. - Size must be 4, 8, or 16 bytes. - Cannot enable both L2 prefetch and L1 bypass. - L2 prefetch size must be 64, 128, or 256 bytes.

Parameters:

  • type (DType): The data type to copy (e.g. float32, int32).
  • size (Int): Number of bytes to copy (must be 4, 8, or 16).
  • fill (OptionalReg[SIMD[type, 1]]): Optional fill value for uncopied bytes when src_size < size.
  • bypass_L1_16B (Bool): If True, bypasses L1 cache for 16-byte copies.
  • l2_prefetch (OptionalReg[Int]): Optional L2 prefetch size (64, 128, or 256 bytes).
  • eviction_policy (CacheEviction): Cache eviction policy for the copy operation.

Args:

  • src (UnsafePointer[SIMD[type, 1], address_space=AddressSpace(1)]): Source pointer in global memory.
  • dst (UnsafePointer[SIMD[type, 1], address_space=AddressSpace(3)]): Destination pointer in shared memory.
  • src_size (SIMD[int32, 1]): Actual bytes to copy from src (remaining bytes use fill value).
  • predicate (Bool): Optional predicate to conditionally execute the copy.