Intro to MAX Engine
MAX Engine is a next-generation graph compiler and runtime that supercharges your AI inference on a wide variety of hardware. You can bring an existing model from PyTorch, ONNX, or TensorFlow1 and run inference with MAX Engine using our Python, C, or Mojo API libraries.
On this page, we'll explain a bit more about the features included with MAX Engine.
If you want to try MAX Engine now, see the guide to get started with MAX, or one of the following API guides:
What MAX Engine offers
MAX Engine supercharges your AI inference workloads and gives your developer team superpowers.
-
Framework optionality: MAX Engine accelerates inference speed for your existing AI models. There's no model conversion step. MAX Engine can compile most models and run them on a wide range of hardware for immediate performance gains.
-
Hardware portability: MAX Engine is built from the ground-up using cutting-edge compiler technologies that enable it to scale in any direction and deliver state-of-the-art performance on any hardware.
-
Model extensibility: MAX Engine is built with the same compiler infrastructure as Mojo, which makes MAX Engine fully extensible with Mojo. You can extend your existing models with custom ops that fuse into the compiled graph, or write your entire inference graph in Mojo.
-
Seamless integration: MAX Engine integrates with industry-standard infrastructure and open-source tools to minimize migration cost. We offer simple solutions to deploy into production on a cloud platform you know and trust.
Using our Python or C APIs, you can seamlessly upgrade your existing pipeline to run inference with MAX Engine—see how it looks with Python. From that point, you can incrementally adopt other MAX features to optimize your model and improve performance.
You don't need to learn Mojo to use MAX Engine. However, Mojo delivers significant performance improvements for any compute workload, as we've demonstrated in a series of blog posts. Here are some ways you can introduce Mojo into your AI pipeline, one step at a time:
-
Use Mojo for inference pre- and post-processing.
-
Use Mojo to run inference (instead of Python/C).
-
Implement custom ops in Mojo.
-
Build your model graph in Mojo.
How MAX Engine works
As illustrated in figure 1, you can use our MAX Engine API libraries in Python, C, and Mojo to load your your existing models and execute them on a wide range of hardware.
All you need to do is load your model and execute (it's 3 lines of code with our Python API). When you load a model, MAX Engine JIT-compiles the model to create an executable graph that's optimized for the local hardware. Depending on the model, the compilation can take some time, but this happens only when you first load the model and it's how MAX Engine optimizes the graph to deliver significant latency savings at run time.
Footnotes
-
Support for TensorFlow is available for enterprises only; it's not included in the MAX SDK. ↩
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!