Mojo module

distributed_matmul

Aliases

`elementwise_epilogue_type`

alias elementwise_epilogue_type = fn[input_index: Int, dtype: DType, rank: Int, width: Int, *, alignment: Int](IndexList[rank], SIMD[dtype, width]) capturing -> None

Functions

matmul_allreduce: Performs C = matmul(A, B^T) followed with Out = allreduce(C) operation across multiple GPUs. Split the A or B and C matrices into num_partitions submatrices at dimension partition_dim. This way we can perform num_partitions independent matmul + allreduce kernels, and overlap some of the computation.

Aliases​

elementwise_epilogue_type​

Functions​

Aliases

`elementwise_epilogue_type`

Functions