Mojo module

warp

GPU warp-level operations and utilities.

This module provides warp-level operations for NVIDIA and AMD GPUs, including:

Shuffle operations to exchange values between threads in a warp:
- shuffle_idx: Copy value from source lane to other lanes
- shuffle_up: Copy from lower lane IDs
- shuffle_down: Copy from higher lane IDs
- shuffle_xor: Exchange values in butterfly pattern
Warp-wide reductions:
- sum: Compute sum across warp
- max: Find maximum value across warp
- min: Find minimum value across warp
- broadcast: Broadcast value to all lanes

The module handles both NVIDIA and AMD GPU architectures through architecture-specific implementations of the core operations. It supports various data types including integers, floats, and half-precision floats, with SIMD vectorization.

Structs

ReductionMethod: Enumerates the supported reduction methods.

Functions

broadcast: Broadcasts a SIMD value from lane 0 to all lanes in the warp.
lane_group_max: Reduces a SIMD value to its maximum within a lane group using warp-level operations.
lane_group_max_and_broadcast: Reduces and broadcasts the maximum value within a lane group using warp-level operations.
lane_group_min: Reduces a SIMD value to its minimum within a lane group using warp-level operations.
lane_group_reduce: Performs a generic warp-level reduction operation using shuffle operations.
lane_group_sum: Computes the sum of values across a group of lanes using warp-level operations.
lane_group_sum_and_broadcast: Computes the sum across a lane group and broadcasts the result to all lanes.
max: Computes the maximum value across all lanes in a warp.
min: Computes the minimum value across all lanes in a warp.
prefix_sum: Computes a warp-level prefix sum (scan) operation.
reduce: Performs a generic warp-wide reduction operation using shuffle operations.
shuffle_down: Copies values from threads with higher lane IDs in the warp.
shuffle_idx: Copies a value from a source lane to other lanes in a warp.
shuffle_up: Copies values from threads with lower lane IDs in the warp.
shuffle_xor: Exchanges values between threads in a warp using a butterfly pattern.
sum: Computes the sum of values across all lanes in a warp.
vote: Creates a 32 or 64 bit mask among all threads in the warp, where each bit is set to 1 if the corresponding thread voted True, and 0 otherwise.

Structs​

Functions​

Structs

Functions