Mojo module
block
GPU block-level operations and utilities.
This module provides block-level operations for NVIDIA and AMD GPUs, including:
- Block-wide reductions:
- sum: Compute sum across block
- max: Find maximum value across block
- min: Find minimum value across block
- broadcast: Broadcast value to all threads
The module builds on warp-level operations from the warp module, extending them to work across a full thread block (potentially multiple warps). It handles both NVIDIA and AMD GPU architectures and supports various data types with SIMD vectorization.
Functions
-
broadcast
: Broadcasts a value from a source thread to all threads in a block. -
max
: Computes the maximum value across all threads in a block. -
min
: Computes the minimum value across all threads in a block. -
prefix_sum
: Performs a prefix sum (scan) operation across all threads in a block. -
sum
: Computes the sum of values across all threads in a block.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!