Skip to main content

Mojo module

distributed_matmul

Aliases

elementwise_epilogue_type

alias elementwise_epilogue_type = fn[Int, DType, Int, Int, Int](IndexList[$2], SIMD[$1, $3]) capturing -> None

Functions

  • matmul_allreduce: Performs C = matmul(A, B^T) followed with Out = allreduce(C) operation across multiple GPUs. Split the A or B and C matrices into num_partitions submatrices at dimension partition_dim. This way we can perform num_partitions independent matmul + allreduce kernels, and overlap some of the computation.

Was this page helpful?