Mojo struct
Semaphore
@register_passable
struct Semaphore
A device-wide semaphore implementation for GPUs.
This struct provides atomic operations and memory barriers for inter-CTA synchronization. It uses a single thread per CTA to perform atomic operations on a shared lock variable.
Implemented traits
AnyType
,
UnknownDestructibility
Methods
__init__
__init__(lock: UnsafePointer[SIMD[int32, 1]], thread_id: Int) -> Self
Initialize a new Semaphore instance.
Args:
- lock (
UnsafePointer[SIMD[int32, 1]]
): Pointer to shared lock variable in global memory. - thread_id (
Int
): Thread ID within the CTA, used to determine if this thread should perform atomic operations.
fetch
fetch(mut self)
Fetch the current state of the semaphore from global memory.
Only the designated wait thread (thread 0) performs the actual load, using an acquire memory ordering to ensure proper synchronization.
state
state(self) -> SIMD[int32, 1]
Get the current state of the semaphore.
Returns:
The current state value of the semaphore.
wait
wait(mut self, status: Int = 0)
Wait until the semaphore reaches the specified state.
Uses a barrier-based spin loop where all threads participate in checking the state. Only proceeds when the state matches the expected status.
Args:
- status (
Int
): The state value to wait for (defaults to 0).
release
release(mut self, status: SIMD[int32, 1] = __init__[__mlir_type.!pop.int_literal](0))
Release the semaphore by setting it to the specified state.
Ensures all threads have reached this point via a barrier before the designated thread updates the semaphore state.
Args:
- status (
SIMD[int32, 1]
): The new state value to set (defaults to 0).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!