Python module
registry
Model registry, for tracking various model variants.
PipelineRegistry
class max.pipelines.lib.registry.PipelineRegistry(architectures)
-
Parameters:
-
architectures (list[SupportedArchitecture])
get_active_huggingface_config()
get_active_huggingface_config(huggingface_repo)
Retrieves or creates a cached HuggingFace AutoConfig for the given model configuration.
This method maintains a cache of HuggingFace configurations to avoid reloading them unnecessarily which incurs a huggingface hub API call. If a config for the given model hasn’t been loaded before, it will create a new one using AutoConfig.from_pretrained() with the model’s settings.
-
Parameters:
-
huggingface_repo (HuggingFaceRepo) – The HuggingFaceRepo containing the model.
-
Returns:
-
The HuggingFace configuration object for the model.
-
Return type:
-
AutoConfig
get_active_tokenizer()
get_active_tokenizer(huggingface_repo)
Retrieves or creates a cached HuggingFace AutoTokenizer for the given model configuration.
This method maintains a cache of HuggingFace tokenizers to avoid reloading them unnecessarily which incurs a huggingface hub API call. If a tokenizer for the given model hasn’t been loaded before, it will create a new one using AutoTokenizer.from_pretrained() with the model’s settings.
-
Parameters:
-
huggingface_repo (HuggingFaceRepo) – The HuggingFaceRepo containing the model.
-
Returns:
-
The HuggingFace tokenizer for the model.
-
Return type:
-
PreTrainedTokenizer | PreTrainedTokenizerFast
register()
register(architecture, *, allow_override=False)
Add new architecture to registry.
-
Parameters:
-
- architecture (SupportedArchitecture)
- allow_override (bool)
-
Return type:
-
None
reset()
reset()
-
Return type:
-
None
retrieve()
retrieve(pipeline_config, task=PipelineTask.TEXT_GENERATION, override_architecture=None)
-
Parameters:
-
- pipeline_config (PipelineConfig)
- task (PipelineTask)
- override_architecture (str | None)
-
Return type:
-
tuple[PipelineTokenizer[Any, Any, Any], PipelineTypes]
retrieve_architecture()
retrieve_architecture(huggingface_repo)
-
Parameters:
-
huggingface_repo (HuggingFaceRepo)
-
Return type:
-
SupportedArchitecture | None
retrieve_factory()
retrieve_factory(pipeline_config, task=PipelineTask.TEXT_GENERATION, override_architecture=None)
-
Parameters:
-
- pipeline_config (PipelineConfig)
- task (PipelineTask)
- override_architecture (str | None)
-
Return type:
-
tuple[PipelineTokenizer[Any, Any, Any], Callable[[], PipelineTypes]]
retrieve_pipeline_task()
retrieve_pipeline_task(pipeline_config)
Retrieve the pipeline task associated with the architecture for the given pipeline configuration.
-
Parameters:
-
pipeline_config (PipelineConfig) – The configuration for the pipeline.
-
Returns:
-
The task associated with the architecture.
-
Return type:
-
Raises:
-
ValueError – If no supported architecture is found for the given model repository.
retrieve_tokenizer()
retrieve_tokenizer(pipeline_config, override_architecture=None)
Retrieves a tokenizer for the given pipeline configuration.
-
Parameters:
-
- pipeline_config (PipelineConfig) – Configuration for the pipeline
- override_architecture (str | None) – Optional architecture override string
-
Returns:
-
The configured tokenizer
-
Return type:
-
Raises:
-
ValueError – If no architecture is found
SupportedArchitecture
class max.pipelines.lib.registry.SupportedArchitecture(name, example_repo_ids, default_encoding, supported_encodings, pipeline_model, task, tokenizer, default_weights_format, rope_type=RopeType.none, weight_adapters=<factory>, multi_gpu_supported=False, required_arguments=<factory>, context_validators=<factory>)
Represents a model architecture configuration for MAX pipelines.
This class defines all the necessary components and settings required to support a specific model architecture within the MAX pipeline system. Each SupportedArchitecture instance encapsulates the model implementation, tokenizer, supported encodings, and other architecture-specific configuration.
New architectures should be registered into the PipelineRegistry
using the register()
method.
Example:
my_architecture = SupportedArchitecture(
name="MyModelForCausalLM", # Must match your Hugging Face model class name
example_repo_ids=[
"your-org/your-model-name", # Add example model repository IDs
],
default_encoding=SupportedEncoding.q4_k,
supported_encodings={
SupportedEncoding.q4_k: [KVCacheStrategy.PAGED],
SupportedEncoding.bfloat16: [KVCacheStrategy.PAGED],
# Add other encodings your model supports
},
pipeline_model=MyModel,
tokenizer=TextTokenizer,
default_weights_format=WeightsFormat.safetensors,
rope_type=RopeType.none,
weight_adapters={
WeightsFormat.safetensors: weight_adapters.convert_safetensor_state_dict,
# Add other weight formats if needed
},
multi_gpu_supported=True, # Set based on your implementation capabilities
required_arguments={"some_arg": True},
task=PipelineTask.TEXT_GENERATION,
)
-
Parameters:
-
- name (str)
- example_repo_ids (list[str])
- default_encoding (SupportedEncoding)
- supported_encodings (dict[SupportedEncoding, list[KVCacheStrategy]])
- pipeline_model (type[PipelineModel[Any]])
- task (PipelineTask)
- tokenizer (Callable[[...], PipelineTokenizer[Any, Any, Any]])
- default_weights_format (WeightsFormat)
- rope_type (RopeType)
- weight_adapters (dict[WeightsFormat, Callable[[...], dict[str, WeightData]]])
- multi_gpu_supported (bool)
- required_arguments (dict[str, bool | int | float])
- context_validators (list[Callable[[TextContext | TextAndVisionContext], None]])
context_validators
context_validators: list[Callable[[TextContext | TextAndVisionContext], None]]
A list of callable context validators for the architecture.
default_encoding
default_encoding: SupportedEncoding
The default quantization encoding to use when no specific encoding is requested.
default_weights_format
default_weights_format: WeightsFormat
The weights format expected by the pipeline_model.
example_repo_ids
A list of Hugging Face repository IDs that use this architecture for testing and validation purposes.
multi_gpu_supported
multi_gpu_supported: bool = False
Whether the architecture supports multi-GPU execution.
name
name: str
The name of the model architecture that must match the Hugging Face model class name.
pipeline_model
pipeline_model: type[PipelineModel[Any]]
The PipelineModel class that defines the model graph structure and execution logic.
required_arguments
A dictionary specifying required values for PipelineConfig options.
rope_type
rope_type: RopeType = 'none'
The type of RoPE (Rotary Position Embedding) used by the model.
supported_encodings
supported_encodings: dict[SupportedEncoding, list[KVCacheStrategy]]
A dictionary mapping supported quantization encodings to their compatible KV cache strategies.
task
task: PipelineTask
The pipeline task type that this architecture supports.
tokenizer
tokenizer: Callable[[...], PipelineTokenizer[Any, Any, Any]]
A callable that returns a PipelineTokenizer instance for preprocessing model inputs.
tokenizer_cls
property tokenizer_cls: type[PipelineTokenizer[Any, Any, Any]]
weight_adapters
weight_adapters: dict[WeightsFormat, Callable[[...], dict[str, WeightData]]]
A dictionary of weight format adapters for converting checkpoints from different formats to the default format.
get_pipeline_for_task()
max.pipelines.lib.registry.get_pipeline_for_task(task, pipeline_config)
-
Parameters:
-
- task (PipelineTask)
- pipeline_config (PipelineConfig)
-
Return type:
-
type[TextGenerationPipeline[TextContext]] | type[EmbeddingsPipeline] | type[SpeculativeDecodingTextGenerationPipeline] | type[AudioGeneratorPipeline] | type[SpeechTokenGenerationPipeline]
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!