Mojo function
rms_norm_gpu_warp_tiling
rms_norm_gpu_warp_tiling[type: DType, //, simd_width: Int, max_warps_per_block: Int, input_fn: fn[Int](row: Int, col: Int) capturing -> SIMD[type, $0], output_fn: fn[Int](row: Int, col: Int, val: SIMD[type, $0]) capturing -> None, multiply_before_cast: Bool](gamma: NDBuffer[type, 1, MutableAnyOrigin], epsilon: SIMD[type, 1], weight_offset: SIMD[type, 1], num_cols: Int)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!