Serving
Our high-performance serving library provides an OpenAI-compatible REST endpoint, enabling a smooth transition from OpenAI services or other libraries like vLLM and SGLang. MAX handles the complete request lifecycle with built-in support for function calling, structured output, and more, plus a Python API for offline inference.
Guides
Tutorials
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!