Python module
rotary_embedding
The rope embedding used within the model.
OptimizedRotaryEmbedding
class max.pipelines.nn.rotary_embedding.OptimizedRotaryEmbedding(dim: int | str | Dim | integer, n_heads: int, theta: float, max_seq_len: int, rope_scaling: ndarray | None = None, _freqs_cis: Value | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, interleaved: bool = True)
Optimized version of RotaryEmbedding using 2D frequency tensor representation.
freqs_cis
property freqs_cis
RotaryEmbedding
class max.pipelines.nn.rotary_embedding.RotaryEmbedding(dim: int | str | Dim | integer, n_heads: int, theta: float, max_seq_len: int, rope_scaling: ndarray | None = None, _freqs_cis: Value | TensorValue | Shape | Dim | int | float | integer | floating | ndarray | None = None, interleaved: bool = True)
RotaryEmbedding layer to calculate and apply the frequency tensor for complex exponentials.
dim
freqs_cis
property freqs_cis*: TensorValue*
freqs_cis_base()
freqs_cis_base() → TensorValue
Computes the frequency tensor for complex exponentials (cis) for a given seq_len. Tensor is scaled with theta parameter. Required to apply Rotary Position Embedding (RoPE) to tensor. See ‘Roformer: Enhanced Transformer with Rotary Embedding’ (arxiv.org/pdf/2104.09864).
-
Returns:
The frequency tensor for complex exponentials with shape : (max_seq_len * 2, dim//(2 * n_heads), 2)
interleaved
interleaved*: bool* = True
max_seq_len
max_seq_len*: int*
The maximum sequence length for model’s input.
n_heads
n_heads*: int*
rope_scaling
Scaling factor for the positional frequencies.
theta
theta*: float*
Hyperparameter used to control the frequency scaling of the sinusoidal components of the embeddings.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!