Python module

rotary_embedding

The rope embedding used within the model.

`OptimizedRotaryEmbedding`

class max.pipelines.nn.rotary_embedding.OptimizedRotaryEmbedding(dim: int | str | Dim | integer, n_heads: int, theta: float, max_seq_len: int, rope_scaling: ndarray | None = None, _freqs_cis: Value | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, interleaved: bool = True)

Optimized version of RotaryEmbedding using 2D frequency tensor representation.

`freqs_cis`

property freqs_cis

`RotaryEmbedding`

class max.pipelines.nn.rotary_embedding.RotaryEmbedding(dim: int | str | Dim | integer, n_heads: int, theta: float, max_seq_len: int, rope_scaling: ndarray | None = None, _freqs_cis: Value | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, interleaved: bool = True)

RotaryEmbedding layer to calculate and apply the frequency tensor for complex exponentials.

`dim`

dim*: int | str | Dim | integer*

`freqs_cis`

property freqs_cis*: TensorValue*

`freqs_cis_base()`

freqs_cis_base() → TensorValue

Computes the frequency tensor for complex exponentials (cis) for a given seq_len. Tensor is scaled with theta parameter. Required to apply Rotary Position Embedding (RoPE) to tensor. See ‘Roformer: Enhanced Transformer with Rotary Embedding’ (arxiv.org/pdf/2104.09864).

Returns:

The frequency tensor for complex exponentials with shape : (max_seq_len * 2, dim//(2 * n_heads), 2)

`interleaved`

interleaved*: bool* = True

`max_seq_len`

max_seq_len*: int*

The maximum sequence length for model’s input.

`n_heads`

n_heads*: int*

`rope_scaling`

rope_scaling*: ndarray | None* = None

Scaling factor for the positional frequencies.

`theta`

theta*: float*

Hyperparameter used to control the frequency scaling of the sinusoidal components of the embeddings.

OptimizedRotaryEmbedding​

freqs_cis​

RotaryEmbedding​

dim​

freqs_cis​

freqs_cis_base()​

interleaved​

max_seq_len​

n_heads​

rope_scaling​

theta​

`OptimizedRotaryEmbedding`

`freqs_cis`

`RotaryEmbedding`

`dim`

`freqs_cis`

`freqs_cis_base()`

`interleaved`

`max_seq_len`

`n_heads`

`rope_scaling`

`theta`