Skip to main content
Log in

Mojo struct

Semaphore

@register_passable struct Semaphore

A device-wide semaphore implementation for GPUs.

This struct provides atomic operations and memory barriers for inter-CTA synchronization. It uses a single thread per CTA to perform atomic operations on a shared lock variable.

Implemented traits

AnyType, UnknownDestructibility

Methods

__init__

__init__(lock: UnsafePointer[SIMD[int32, 1]], thread_id: Int) -> Self

Initialize a new Semaphore instance.

Args:

  • lock (UnsafePointer[SIMD[int32, 1]]): Pointer to shared lock variable in global memory.
  • thread_id (Int): Thread ID within the CTA, used to determine if this thread should perform atomic operations.

fetch

fetch(mut self)

Fetch the current state of the semaphore from global memory.

Only the designated wait thread (thread 0) performs the actual load, using an acquire memory ordering to ensure proper synchronization.

state

state(self) -> SIMD[int32, 1]

Get the current state of the semaphore.

Returns:

The current state value of the semaphore.

wait

wait(mut self, status: Int = 0)

Wait until the semaphore reaches the specified state.

Uses a barrier-based spin loop where all threads participate in checking the state. Only proceeds when the state matches the expected status.

Args:

  • status (Int): The state value to wait for (defaults to 0).

release

release(mut self, status: SIMD[int32, 1] = __init__[__mlir_type.!pop.int_literal](0))

Release the semaphore by setting it to the specified state.

Ensures all threads have reached this point via a barrier before the designated thread updates the semaphore state.

Args:

  • status (SIMD[int32, 1]): The new state value to set (defaults to 0).