Mojo struct

Semaphore

@register_passable(trivial) struct Semaphore

A device-wide semaphore implementation for GPUs.

This struct provides atomic operations and memory barriers for inter-CTA synchronization. It uses a single thread per CTA to perform atomic operations on a shared lock variable.

Implemented traits

AnyType, Copyable, Movable, UnknownDestructibility

Methods

`init`

__init__(lock: UnsafePointer[SIMD[int32, 1]], thread_id: Int) -> Self

Initialize a new Semaphore instance.

Args:

lock (UnsafePointer[SIMD[int32, 1]]): Pointer to shared lock variable in global memory.
thread_id (Int): Thread ID within the CTA, used to determine if this thread should perform atomic operations.

`fetch`

fetch(mut self)

Fetch the current state of the semaphore from global memory.

Only the designated wait thread (thread 0) performs the actual load, using an acquire memory ordering to ensure proper synchronization.

`state`

state(self) -> SIMD[int32, 1]

Get the current state of the semaphore.

Returns:

The current state value of the semaphore.

`wait`

wait(mut self, status: Int = 0)

Wait until the semaphore reaches the specified state.

Uses a barrier-based spin loop where all threads participate in checking the state. Only proceeds when the state matches the expected status.

Args:

status (Int): The state value to wait for (defaults to 0).

`release`

release(mut self, status: SIMD[int32, 1] = 0)

Release the semaphore by setting it to the specified state.

Ensures all threads have reached this point via a barrier before the designated thread updates the semaphore state.

Args:

status (SIMD[int32, 1]): The new state value to set (defaults to 0).

Implemented traits​

Methods​

__init__​

fetch​

state​

wait​

release​

Implemented traits

Methods

`init`

`fetch`

`state`

`wait`

`release`