Skip to main content

What is MAX

The Modular Accelerated eXecution (MAX) platform is a unified set of APIs and tools that simplify the process to deploy your own AI endpoint with state-of-the-art performance. MAX provides complete flexibility, so you can use your own data and your own AI model on the hardware of your choice, with the best performance-to-cost tradeoff.

Our next-generation model compiler and runtime deliver unparalleled speed for your PyTorch and GenAI models. But MAX is much more than just a fast runtime. It also includes a quick-to-deploy serving layer that orchestrates inference inputs and outputs between your model and client application. Additionally, MAX provides a highly programmable interface for model optimization and GPU programming.

We built MAX because programming across the entire AI stack—from the application layer all the way down to the GPU kernels—was way too complicated. We wanted a programming model that could deliver state-of-the-art performance throughout the entire AI software stack.

What MAX offers

  • Framework optionality: With just a few lines of code, MAX accelerates AI models from PyTorch and ONNX on a wide range of hardware.

  • Unparalleled GenAI performance: MAX includes a Python API to build GenAI pipelines such as the latest large-language models (LLMs) with state-of-the-art performance on CPUs and GPUs.

  • Hardware portability: We built MAX from the ground-up with next-generation compiler technologies that enable it to scale in any direction for state-of-the-art performance on any hardware, from CPUs to GPUs, and beyond.

  • Model extensibility: MAX allows you to customize your model with custom ops, and write high-performance GPU kernels. When you do, the MAX graph compiler can optimize the entire AI pipeline for different hardware.

  • Seamless deployment: MAX integrates with existing tools and infrastructure to minimize migration effort. You can deploy into production on a cloud platform you know and trust, and our optional high-performance serving interface (MAX Serve) provides familiar endpoint APIs.

MAX enables all of this with a rich set of Python APIs, backed by custom graph ops and GPU kernels written in Mojo. Mojo is a new language that looks and feels like Python and integrates with Python code, but it provides the performance, control, and safety of languages like C++, Rust, and Swift.

How to use MAX

You can install MAX into an existing project as a conda package, or you can use our Magic command-line tool to create a project virtual environment with MAX and other conda/PyPI packages.

With our Python API, you need only a few lines of code to run inference with MAX instead of using the PyTorch or ONNX runtime. This simple change executes your models up to 5x faster to reduce your latency and compute costs. (Also available with C and Mojo APIs.)

Next, you can write custom ops that MAX can analyze, optimize, and fuse into the graph. Or, you can build your model with the MAX Graph Python API to unlock even more performance for generative AI pipelines.

When you're ready for production, you can use the tools you already enjoy to deploy MAX, or create a serving endpoint with MAX, which provides a familiar endpoint that supports OpenAI REST APIs.

Get started

Learn more