Back to list
High-Potential
Python
🖥️ Native MLX Inference Server for Apple Silicon
1,320 stars188 forksPython
anthropicapple-siliconaudio-processingclaude-codecomputer-visionimage-understandinginferencellmmachine-learningmacosmllmmlx
In short, this project brings a vLLM-like experience to Apple Silicon. It is an inference server compatible with OpenAI and Anthropic APIs, built on top of Apple's native MLX framework. It supports large language models and vision-language models like Llama and Qwen-VL.
Beyond continuous batching and multimodal support, it natively handles MCP tool calling and works directly with Claude Code. For developers building AI applications locally on Macs, this is a solid infrastructure tool that maximizes M-series chip performance while maintaining standard API compatibility.