Mojo function
cp_async_bulk_tensor_reduce
cp_async_bulk_tensor_reduce[src_type: AnyType, rank: Int, /, *, reduction_kind: StringLiteral, eviction_policy: CacheEviction = 0](src_mem: UnsafePointer[src_type, address_space=3], tma_descriptor: UnsafePointer[NoneType], coords: IndexList[rank])
These instructions initiate an asynchronous reduction operation of tensor data in global memory with the tensor data in shared{::cta} memory, using tile
mode.
Args:
- src_mem (
UnsafePointer[src_type, address_space=3]
): Pointer to source shared memory. - tma_descriptor (
UnsafePointer[NoneType]
): Pointer to tensor map descriptor. - coords (
IndexList[rank]
): Tile coordinates.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!