Mammoth
Mammoth (formerly referred to as MAX Inference Cluster) is a Kubernetes-native distributed AI serving tool that makes it easier to run and manage LLMs at scale using MAX as a backend for optimal model performance. It's built on the Modular Platform and is designed to give you efficient use of your hardware with minimal configuration, even when running multiple models across thousands of nodes.
The Mammoth control plane automatically selects the best available hardware to meet performance targets when deploying a model and supports both manual and automatic scaling. Mammoth's built-in router intelligently distributes traffic, taking into account hardware load, GPU memory, and caching states. You can deploy and serve multiple models simultaneously across different hardware types or versions without complex setup or duplication of infrastructure.
Become a design partner
Mammoth is currently only available through Modular's early access program where we're actively partnering with select organizations as design partners. Design partners collaborate directly with Modular's engineering and product teams, gain early access to in-development features, and receive tailored guidance on integrating the Modular Platform into their existing generative AI workloads.
Get the latest updates
Stay up to date with announcements and releases. We're moving fast over here.
Talk to an AI Expert
Connect with our product experts to explore how we can help you deploy and serve AI models with high performance, scalability, and cost-efficiency.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!