Skip to main content

Mojo struct

NamedBarrierSemaphore

@register_passable(trivial) struct NamedBarrierSemaphore[thread_count: Int32, id_offset: Int32, max_num_barriers: Int32]

A device-wide semaphore implementation for NVIDIA GPUs with named barriers.

It's using an acquire-release logic instead of atomic instructions for inter-CTA synchronization with a shared lock variable. Please note that the memory barrier is for syncing warp groups within in a CTA. Cutlass reference implementation: https://github.com/NVIDIA/cutlass/blob/a1aaf2300a8fc3a8106a05436e1a2abad0930443/include/cutlass/arch/barrier.h.

Implemented traits

AnyType, ExplicitlyCopyable, ImplicitlyCopyable, Movable, UnknownDestructibility

Aliases

__copyinit__is_trivial

alias __copyinit__is_trivial = True

__del__is_trivial

alias __del__is_trivial = True

__moveinit__is_trivial

alias __moveinit__is_trivial = True

Methods

__init__

__init__(lock: UnsafePointer[Int32], thread_id: Int) -> Self

Initialize a new Semaphore instance.

Args:

  • lock (UnsafePointer): Pointer to shared lock variable in global memory.
  • thread_id (Int): Thread ID within the CTA, used to determine if this thread should perform atomic operations.

state

state(self) -> Int32

Get the current state of the semaphore.

Returns:

Int32: The current state value of the semaphore.

wait_eq

wait_eq(mut self, id: Int32, status: Int32 = 0)

wait_lt

wait_lt(mut self, id: Int32, count: Int32 = 0)

arrive_set

arrive_set(self, id: Int32, status: Int32 = 0)

Was this page helpful?