🖥️ Native MLX Inference Server for Apple Silicon

1,320 stars188 forksPython

anthropicapple-siliconaudio-processingclaude-codecomputer-visionimage-understandinginferencellmmachine-learningmacosmllmmlx

In short, this project brings a vLLM-like experience to Apple Silicon. It is an inference server compatible with OpenAI and Anthropic APIs, built on top of Apple's native MLX framework. It supports large language models and vision-language models like Llama and Qwen-VL. Beyond continuous batching and multimodal support, it natively handles MCP tool calling and works directly with Claude Code. For developers building AI applications locally on Macs, this is a solid infrastructure tool that maximizes M-series chip performance while maintaining standard API compatibility.

View on GitHub