Mojo module
distributed_matmul
Aliases
elementwise_epilogue_type
alias elementwise_epilogue_type = fn[Int, DType, Int, Int, Int](IndexList[$2], SIMD[$1, $3]) capturing -> None
Functions
-
matmul_allreduce
: Performs C = matmul(A, B^T) followed with Out = allreduce(C) operation across multiple GPUs. Split the A or B and C matrices intonum_partitions
submatrices at dimensionpartition_dim
. This way we can performnum_partitions
independent matmul + allreduce kernels, and overlap some of the computation.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!