Mojo function

topk_fused_sampling_gpu

topk_fused_sampling_gpu[type: DType, rank: Int, out_idx_type: DType, //](ctx: DeviceContext, K: Int, input: NDBuffer[type, rank, origin], out_idxs: NDBuffer[out_idx_type, rank, origin], block_size: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1}), num_blocks_per_input: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1}), temperature: SIMD[type, 1] = __init__[__mlir_type.!pop.int_literal](1))

Top K algorithm with fused sampling. Returns the sampled indices from the Top-K of the innermost dimension of the input tensor for each row/subvolume.