Mojo function
cp_async_bulk_tensor_shared_cluster_global_multicast
cp_async_bulk_tensor_shared_cluster_global_multicast[dst_type: AnyType, mbr_type: AnyType, rank: Int](dst_mem: UnsafePointer[dst_type, address_space=3], tma_descriptor: UnsafePointer[NoneType], mem_bar: UnsafePointer[mbr_type, address_space=3], coords: IndexList[rank], multicast_mask: SIMD[uint16, 1])
Initiates an asynchronous multicast load operation on the tensor data from global memory to shared memories of the cluster.
Args:
- dst_mem (
UnsafePointer[dst_type, address_space=3]
): Pointer to destination shared memory. - tma_descriptor (
UnsafePointer[NoneType]
): Pointer to tensor map descriptor. - mem_bar (
UnsafePointer[mbr_type, address_space=3]
): A pointer to shared memory barrier. - coords (
IndexList[rank]
): Tile coordinates. - multicast_mask (
SIMD[uint16, 1]
): An uint16 bitmask to the copy operation to specify which CTAs in a cluster will participate in the TMA multicast load.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!