Skip to main content

Python module

attention.multihead_attention

MultiheadAttention

class max.nn.attention.multihead_attention.MultiheadAttention(num_attention_heads, hidden_size, devices=None, dtype=float32, scale=None, qkv_has_bias=False, o_proj_has_bias=False, stacked_qkv=False)

Multihead attention that handles both single and distributed computation.

Initializes the attention layer.

Parameters:

  • num_attention_heads (int) – The number of attention heads.
  • hidden_size (int) – The dimension of the hidden states (embed_dim).
  • devices (Sequence[DeviceRef] | None) – Device(s) to place the weights and run the computation. If multiple devices provided, uses distributed computation.
  • dtype (DType) – DType of the QKV and output projection weights.
  • scale (float | None) – Value used to scale the results of the attention output.
  • has_bias – Whether to use an attention bias.
  • stacked_qkv (bool) – Whether to use a single stacked QKV weight matrix.
  • qkv_has_bias (bool)
  • o_proj_has_bias (bool)

wqkv

property wqkv: TensorValue

The concatenation of q, k, and v weight vectors.

wqkv_bias

property wqkv_bias: TensorValue | None

The concatenation of q, k, and v bias weight vectors.

Was this page helpful?