Skip to main content
Log in

Python module

rotary_embedding

The rope embedding used within the model.

OptimizedRotaryEmbedding

class max.pipelines.nn.rotary_embedding.OptimizedRotaryEmbedding(dim: int | str | Dim | integer, n_heads: int, theta: float, max_seq_len: int, rope_scaling: ndarray | None = None, _freqs_cis: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, interleaved: bool = True)

Optimized version of RotaryEmbedding using 2D frequency tensor representation.

freqs_cis

property freqs_cis

RotaryEmbedding

class max.pipelines.nn.rotary_embedding.RotaryEmbedding(dim: int | str | Dim | integer, n_heads: int, theta: float, max_seq_len: int, rope_scaling: ndarray | None = None, _freqs_cis: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, interleaved: bool = True)

RotaryEmbedding layer to calculate and apply the frequency tensor for complex exponentials.

dim

dim*: int | str | Dim | integer*

freqs_cis

property freqs_cis*: TensorValue*

freqs_cis_base()

freqs_cis_base() → TensorValue

Computes the frequency tensor for complex exponentials (cis) for a given seq_len. Tensor is scaled with theta parameter. Required to apply Rotary Position Embedding (RoPE) to tensor. See ‘Roformer: Enhanced Transformer with Rotary Embedding’ (arxiv.org/pdf/2104.09864).

  • Returns:

    The frequency tensor for complex exponentials with shape : (max_seq_len * 2, dim//(2 * n_heads), 2)

interleaved

interleaved*: bool* = True

max_seq_len

max_seq_len*: int*

The maximum sequence length for model’s input.

n_heads

n_heads*: int*

rope_scaling

rope_scaling*: ndarray | None* = None

Scaling factor for the positional frequencies.

theta

theta*: float*

Hyperparameter used to control the frequency scaling of the sinusoidal components of the embeddings.