Mojo struct

NamedBarrierSemaphore

@register_passable(trivial) struct NamedBarrierSemaphore[thread_count: Int32, id_offset: Int32, max_num_barriers: Int32]

A device-wide semaphore implementation for NVIDIA GPUs with named barriers.

It's using an acquire-release logic instead of atomic instructions for inter-CTA synchronization with a shared lock variable. Please note that the memory barrier is for syncing warp groups within in a CTA. Cutlass reference implementation: https://github.com/NVIDIA/cutlass/blob/a1aaf2300a8fc3a8106a05436e1a2abad0930443/include/cutlass/arch/barrier.h.

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, Movable, UnknownDestructibility

Aliases

`copyinitis_trivial`

alias __copyinit__is_trivial = True

`delis_trivial`

alias __del__is_trivial = True

`moveinitis_trivial`

alias __moveinit__is_trivial = True

Methods

`init`

__init__(lock: UnsafePointer[Int32], thread_id: Int) -> Self

Initialize a new Semaphore instance.

Args:

lock (UnsafePointer): Pointer to shared lock variable in global memory.
thread_id (Int): Thread ID within the CTA, used to determine if this thread should perform atomic operations.

`state`

state(self) -> Int32

Get the current state of the semaphore.

Returns:

Int32: The current state value of the semaphore.

`wait_eq`

wait_eq(mut self, id: Int32, status: Int32 = 0)

`wait_lt`

wait_lt(mut self, id: Int32, count: Int32 = 0)

`arrive_set`

arrive_set(self, id: Int32, status: Int32 = 0)

Implemented traits​

Aliases​

__copyinit__is_trivial​

__del__is_trivial​

__moveinit__is_trivial​

Methods​

__init__​

state​

wait_eq​

wait_lt​

arrive_set​