Mojo struct
NamedBarrierSemaphore
@register_passable(trivial)
struct NamedBarrierSemaphore[thread_count: Int32, id_offset: Int32, max_num_barriers: Int32]
A device-wide semaphore implementation for NVIDIA GPUs with named barriers.
It's using an acquire-release logic instead of atomic instructions for inter-CTA synchronization with a shared lock variable. Please note that the memory barrier is for syncing warp groups within in a CTA. Cutlass reference implementation: https://github.com/NVIDIA/cutlass/blob/a1aaf2300a8fc3a8106a05436e1a2abad0930443/include/cutlass/arch/barrier.h.
Implemented traits
AnyType
,
ExplicitlyCopyable
,
ImplicitlyCopyable
,
Movable
,
UnknownDestructibility
Aliases
__copyinit__is_trivial
alias __copyinit__is_trivial = True
__del__is_trivial
alias __del__is_trivial = True
__moveinit__is_trivial
alias __moveinit__is_trivial = True
Methods
__init__
__init__(lock: UnsafePointer[Int32], thread_id: Int) -> Self
Initialize a new Semaphore instance.
Args:
- lock (
UnsafePointer
): Pointer to shared lock variable in global memory. - thread_id (
Int
): Thread ID within the CTA, used to determine if this thread should perform atomic operations.
state
state(self) -> Int32
Get the current state of the semaphore.
Returns:
Int32
: The current state value of the semaphore.
wait_eq
wait_eq(mut self, id: Int32, status: Int32 = 0)
wait_lt
wait_lt(mut self, id: Int32, count: Int32 = 0)
arrive_set
arrive_set(self, id: Int32, status: Int32 = 0)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!