Mojo function
ld_matrix
ld_matrix[type: DType, //, simd_width: Int, *, transpose: Bool = False](ptr: UnsafePointer[SIMD[type, 1], address_space=3]) -> SIMD[type, simd_width]
Performs warp sync copy from shared memory to registers. Loads in a fashion that can be used directly by tensor core MMA instructions.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!