Modular's MAX platform has added support for Apple Silicon GPUs (from M1 to M5), allowing modern neural networks to run locally on Mac.

What Happened

MAX platform developers have implemented GPU support for the Apple Silicon architecture. Users can now directly utilize graphics cores to run text LLMs, such as Qwen 3.5, computer vision models, and generative models, including FLUX.2 [klein] 4B. Special flags for managing batch size and memory utilization have been added to work with the unified memory architecture.

Context

Previously, effectively executing heavy models required either specialized cloud solutions or relying solely on the CPU, which significantly limited speed. The new support allows for maximum efficiency by leveraging the advantages of Apple Silicon's unified memory for inference tasks.

Why It Matters for the Industry

Expanding the high-performance model execution ecosystem to consumer Apple hardware reduces developer dependence on proprietary frameworks and renting cloud GPUs for local debugging. This creates a foundation for the development of Edge AI and standardizes the use of Mac as a full-fledged workstation for AI agent development.

Why It Matters for Users

MacBook users gain the ability to run heavy modern models like FLUX.2 directly on their devices using the GPU. This significantly accelerates content generation and allows for fast, cheap, and private hypothesis testing in local mode without reaching out to external servers.

What Is Not Yet Known / Limitations

Current technical data does not address issues of corporate security, data management, and compliance with Enterprise AI standards, which are important for large businesses.

Sources

Author

Look at AI, Editorial Team