Python module

embedding

`Embedding`

class max.nn.embedding.Embedding(weights: 'TensorValueLike')

`weights`

weights*: Value | BufferValue | TensorValue | Shape | Dim | int | float | integer | floating | ndarray*

`EmbeddingV2`

class max.nn.embedding.EmbeddingV2(vocab_size: int, hidden_dim: int, dtype: DType, device: DeviceRef | None = None, quantization_encoding: QuantizationEncoding | None = None, name: str | None = None)

A lookup table for embedding integer indices into dense vectors.

This layer maps each integer index to a dense vector of fixed size. Embedding weights are stored on the CPU but are moved to the specified device during the model init phase.

Example:

embedding_layer = EmbeddingV2(
    vocab_size=1000,
    hidden_dim=256,
    dtype=DType.float32,
    device=DeviceRef.GPU(),
    name="embeddings",
)

token_indices: TensorValueLike
embeddings = embedding_layer(token_indices)
embedding_layer = EmbeddingV2(
    vocab_size=1000,
    hidden_dim=256,
    dtype=DType.float32,
    device=DeviceRef.GPU(),
    name="embeddings",
)

token_indices: TensorValueLike
embeddings = embedding_layer(token_indices)

Initializes the embedding layer with the given arguments.

Parameters:
- vocab_size – The number of unique items in the vocabulary. Indices must be in the range [0, vocab_size).
- hidden_dim – The dimensionality of each embedding vector.
- dtype – The data type of the embedding weights.
- device – The device where embedding lookups are executed. Model init transfers the initially CPU-resident weights to this device.
- name – The name identifier for the embedding weight matrix.

`device`

device*: DeviceRef | None*

The device on which embedding lookup is performed.

`weight`

weight*: Weight*

The embedding weight matrix stored on the CPU. Model init moves weights to the device specified in device.

`VocabParallelEmbedding`

class max.nn.embedding.VocabParallelEmbedding(vocab_size: int, hidden_dim: int, dtype: DType, devices: list[max.graph.type.DeviceRef], quantization_encoding: QuantizationEncoding | None = None, name: str | None = None)

A lookup table for embedding integer indices into dense vectors.

This layer works like nn.Embedding except the embedding table is sharded on the vocabulary dimension across all devices.

Example:

embedding_layer = VocabParallelEmbedding(
    vocab_size=1000,
    hidden_dim=256,
    dtype=DType.float32,
    device=[DeviceRef.GPU(0), DeviceRef.GPU(1)],
    name="embeddings",
)

# Token indices of shape: [batch, ..., num_indices].
token_indices: TensorValueLike
embeddings = embedding_layer(token_indices)
embedding_layer = VocabParallelEmbedding(
    vocab_size=1000,
    hidden_dim=256,
    dtype=DType.float32,
    device=[DeviceRef.GPU(0), DeviceRef.GPU(1)],
    name="embeddings",
)

# Token indices of shape: [batch, ..., num_indices].
token_indices: TensorValueLike
embeddings = embedding_layer(token_indices)

Parameters:
- vocab_size – The number of unique items in the vocabulary. Indices must be in the range [0, vocab_size).
- hidden_dim – The dimensionality of each embedding vector.
- dtype – The data type of the embedding weights.
- devices – The devices where embedding lookups are executed. Model init transfers the initially CPU-resident weights to this device.
- name – The name identifier for the embedding weight matrix.

Embedding​

weights​

EmbeddingV2​

device​

weight​

VocabParallelEmbedding​