Mojo function
fused_token_sampling_gpu
fused_token_sampling_gpu[dtype: DType, out_idx_type: DType, //](ctx: DeviceContext, max_k: Int, input: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], out_idxs: LayoutTensor[out_idx_type, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], block_size: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1}), num_blocks_per_input: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1}), k: OptionalReg[LayoutTensor[DType.int64, Layout.row_major(-1), MutableAnyOrigin]] = OptionalReg[LayoutTensor[DType.int64, Layout.row_major(-1), MutableAnyOrigin]]({:i1 0, 1}), temperature: OptionalReg[LayoutTensor[DType.float32, Layout.row_major(-1), MutableAnyOrigin]] = OptionalReg[LayoutTensor[DType.float32, Layout.row_major(-1), MutableAnyOrigin]]({:i1 0, 1}), top_p: OptionalReg[LayoutTensor[DType.float32, Layout.row_major(-1), MutableAnyOrigin]] = OptionalReg[LayoutTensor[DType.float32, Layout.row_major(-1), MutableAnyOrigin]]({:i1 0, 1}), seed: OptionalReg[LayoutTensor[DType.uint64, Layout.row_major(-1), MutableAnyOrigin]] = OptionalReg[LayoutTensor[DType.uint64, Layout.row_major(-1), MutableAnyOrigin]]({:i1 0, 1}))
Top K algorithm with fused sampling. Returns the sampled indices from the Top-K of the innermost dimension of the input tensor for each row/subvolume.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!