Python module

entrypoints

`LLM`

class max.entrypoints.llm.LLM(pipeline_config)

A high level interface for interacting with LLMs.

generate(prompts, max_new_tokens=100, use_tqdm=True)

Generates text completions for the given prompts.

This method is thread safe and may be used on the same LLM instance from multiple threads concurrently with no external synchronization.

Parameters:

prompts (str | Sequence[str]) – The input string or list of strings to generate completions for.
max_new_tokens (int | None) – The maximum number of tokens to generate in the response.
use_tqdm (bool) – Whether to display a progress bar during generation.

Returns:

A list of generated text completions corresponding to each input prompt.

Raises:

Return type:

Sequence[str]