Mojo function

warp_sum

warp_sum[val_type: DType, simd_width: Int, //](val: SIMD[val_type, simd_width]) -> SIMD[val_type, simd_width]

warp_sum[intermediate_type: DType, *, reduction_method: ReductionMethod, output_type: DType](x: SIMD[type, size]) -> SIMD[output_type, 1]

Performs a warp level reduction using either a warp shuffle or tensor core operation. If the tensor core method is chosen, then the input value is cast to the intermediate type to make the value consumable by the tensor core op.