Skip to main content
Log in

Mojo function

cp_async_bulk_tensor_shared_cluster_global_multicast

cp_async_bulk_tensor_shared_cluster_global_multicast[dst_type: AnyType, mbr_type: AnyType, rank: Int](dst_mem: UnsafePointer[dst_type, address_space=3], tma_descriptor: UnsafePointer[NoneType], mem_bar: UnsafePointer[mbr_type, address_space=3], coords: IndexList[rank], multicast_mask: SIMD[uint16, 1])

Initiates an asynchronous multicast load operation on the tensor data from global memory to shared memories of the cluster.

Args:

  • dst_mem (UnsafePointer[dst_type, address_space=3]): Pointer to destination shared memory.
  • tma_descriptor (UnsafePointer[NoneType]): Pointer to tensor map descriptor.
  • mem_bar (UnsafePointer[mbr_type, address_space=3]): A pointer to shared memory barrier.
  • coords (IndexList[rank]): Tile coordinates.
  • multicast_mask (SIMD[uint16, 1]): An uint16 bitmask to the copy operation to specify which CTAs in a cluster will participate in the TMA multicast load.