Mojo function
rms_norm_fused_residual_add
rms_norm_fused_residual_add[dtype: DType, rank: Int, //, input_0_fn: fn[Int, Int](IndexList[$1]) capturing -> SIMD[dtype, $0], input_1_fn: fn[Int, Int](IndexList[$1]) capturing -> SIMD[dtype, $0], output_0_fn: fn[Int, Int, Int](idx: IndexList[$1], val: SIMD[dtype, $0]) capturing -> None, output_residual_fn: fn[Int, Int, Int](IndexList[$1], SIMD[dtype, $0]) capturing -> None, /, target: StringSlice[StaticConstantOrigin] = "cpu", multiply_before_cast: Bool = True](shape: IndexList[rank], gamma1: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], epsilon1: Scalar[dtype], weight_offset1: Scalar[dtype], gamma2: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], epsilon2: Scalar[dtype], weight_offset2: Scalar[dtype], ctx: DeviceContextPtr)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!