Skip to main content
Log in

Python module

registry

Model registry, for tracking various model variants.

PipelineRegistry

class max.pipelines.registry.PipelineRegistry(architectures: list[max.pipelines.registry.SupportedArchitecture])

get_active_huggingface_config()

get_active_huggingface_config(huggingface_repo: HuggingFaceRepo) → AutoConfig

Retrieves or creates a cached HuggingFace AutoConfig for the given model configuration.

This method maintains a cache of HuggingFace configurations to avoid reloading them unnecessarily which incurs a huggingface hub API call. If a config for the given model hasn’t been loaded before, it will create a new one using AutoConfig.from_pretrained() with the model’s settings.

  • Parameters:

    huggingface_repo – The HuggingFaceRepo containing the model.

  • Returns:

    The HuggingFace configuration object for the model.

  • Return type:

    AutoConfig

get_active_tokenizer()

get_active_tokenizer(huggingface_repo: HuggingFaceRepo) → PreTrainedTokenizer | PreTrainedTokenizerFast

Retrieves or creates a cached HuggingFace AutoTokenizer for the given model configuration.

This method maintains a cache of HuggingFace tokenizers to avoid reloading them unnecessarily which incurs a huggingface hub API call. If a tokenizer for the given model hasn’t been loaded before, it will create a new one using AutoTokenizer.from_pretrained() with the model’s settings.

  • Parameters:

    huggingface_repo – The HuggingFaceRepo containing the model.

  • Returns:

    The HuggingFace tokenizer for the model.

  • Return type:

    PreTrainedTokenizer | PreTrainedTokenizerFast

register()

register(architecture: SupportedArchitecture)

Add new architecture to registry.

reset()

reset() → None

retrieve()

retrieve(pipeline_config: PipelineConfig, task: PipelineTask = PipelineTask.TEXT_GENERATION) → tuple[PipelineTokenizer, TokenGenerator | EmbeddingsGenerator]

retrieve_architecture()

retrieve_architecture(huggingface_repo: HuggingFaceRepo) → SupportedArchitecture | None

retrieve_factory()

retrieve_factory(pipeline_config: PipelineConfig, task: PipelineTask = PipelineTask.TEXT_GENERATION) → tuple[PipelineTokenizer, Callable[[], TokenGenerator | EmbeddingsGenerator]]

SupportedArchitecture

class max.pipelines.registry.SupportedArchitecture(name: str, example_repo_ids: list[str], default_encoding: SupportedEncoding, supported_encodings: dict[max.pipelines.config_enums.SupportedEncoding, list[max.nn.kv_cache.cache_params.KVCacheStrategy]], pipeline_model: type[max.pipelines.pipeline.PipelineModel], task: PipelineTask, tokenizer: type[Union[max.pipelines.tokenizer.TextTokenizer, max.pipelines.tokenizer.TextAndVisionTokenizer]], default_weights_format: WeightsFormat, multi_gpu_supported: bool = False, rope_type: RopeType = RopeType.none, weight_adapters: dict[max.graph.weights.format.WeightsFormat, Callable[..., dict[str, max.graph.weights.weights.WeightData]]] | None = None)

Initializes a model architecture supported by MAX pipelines.

New architectures should be registered into the PipelineRegistry.

  • Parameters:

    • name – Architecture name.
    • example_repo_ids – Hugging Face repo_id which runs this architecture.
    • default_encoding – Default encoding for the model.
    • supported_encodings – Alternate encodings supported.
    • pipeline_modelPipelineModel class that defines the model graph and execution.
    • task – Which pipeline task should the model run with.
    • tokenizer – Tokenizer used to preprocess model inputs.
    • default_weights_format – The weights format used in pipeline_model.
    • weight_converters – A dictionary of weight loaders to use if the input checkpoint has a different format than the default.

get_pipeline_for_task()

max.pipelines.registry.get_pipeline_for_task(task: PipelineTask, pipeline_config: PipelineConfig) → type[TextGenerationPipeline] | type[EmbeddingsPipeline] | type[SpeculativeDecodingTextGenerationPipeline]