Mojo function
load_matrix_a
load_matrix_a[m: Int, n: Int, k: Int](a_ptr: UnsafePointer[SIMD[float32, 1]], tile_row: Int, tile_col: Int, ldm: Int) -> SIMD[float32, 4]
For shape m16n8k8 type tf32 loads matrix A tile from memory to registers in specific order to be used by tensor cores to perform a warp sync mma op.
load_matrix_a[m: Int, n: Int, k: Int](a_ptr: UnsafePointer[SIMD[float16, 1]], tile_row: Int, tile_col: Int, ldm: Int) -> SIMD[float16, 4]
For shape m16n8k8 & type fp16 loads matrix A tile from memory to registers in specific order to be used by tensor cores to perform a warp sync mma op.
load_matrix_a[m: Int, n: Int, k: Int](a_ptr: UnsafePointer[SIMD[bfloat16, 1]], tile_row: Int, tile_col: Int, ldm: Int) -> SIMD[bfloat16, k.__floordiv__(2)]
For type bfp16 loads matrix A tile from memory to registers in specific order to be used by tensor cores to perform a warp sync mma op.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!