Skip to main content
Log in

Python module

attention_without_mask

An opaque KV Cache optimized vanilla attention mechanism, with Mask Variants provided inside the Kernel.

AttentionWithoutMask

class max.pipelines.nn.attention.attention_without_mask.AttentionWithoutMask(n_heads: int, kv_params: max.pipelines.kv_cache.cache_params.KVCacheParams, layer_idx: max.graph.value.TensorValue, wqkv: max.graph.value.TensorValue, wo: max.pipelines.nn.linear.Linear, mask_variant: max.pipelines.nn.kernels.MHAMaskVariant)

mask_variant

mask_variant*: MHAMaskVariant*