Mojo function
all_reduce_p2p
all_reduce_p2p[type: DType, rank: Int, ngpus: Int, //](ctxs: List[DeviceContext], list_of_in_bufs: StaticTuple[NDBuffer[type, rank], ngpus], list_of_out_bufs: StaticTuple[NDBuffer[type, rank], ngpus], rank_sigs: StaticTuple[UnsafePointer[Signal], 8])
Performs all-reduce using peer-to-peer access between GPUs.
Arguments: ctxs: List of device contexts for participating GPUs list_of_in_bufs: Input buffers from each GPU list_of_out_bufs: Output buffers for each GPU rank_sigs: Signal pointers for synchronization
Launches P2P reduction kernel on each GPU to perform direct reduction.
Parameters:
- type (
DType
): DType - Data type of tensor elements. - rank (
Int
): Int - Number of dimensions in tensors. - ngpus (
Int
): Int - Number of GPUs participating.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!