Python module
attention_without_mask
An opaque KV Cache optimized vanilla attention mechanism, with Mask Variants provided inside the Kernel.
AttentionWithoutMask
class max.pipelines.nn.attention.attention_without_mask.AttentionWithoutMask(n_heads: int, kv_params: max.pipelines.kv_cache.cache_params.KVCacheParams, layer_idx: max.graph.value.TensorValue, wqkv: max.graph.value.TensorValue, wo: max.pipelines.nn.linear.Linear, mask_variant: max.pipelines.nn.kernels.MHAMaskVariant)
mask_variant
mask_variant*: MHAMaskVariant*
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!