⚡ Shimmy: Python-Free Rust Inference Server

5,253 stars486 forksRust

api-servercommand-line-tooldeveloper-toolsggufhuggingfacehuggingface-modelshuggingface-transformersinference-serverllamallamacppllm-inferencelocal-ai

Shimmy is a local LLM inference server written entirely in Rust, emphasizing a completely Python-free environment. It is compatible with the OpenAI API format and supports both GGUF and SafeTensors model weights. Distributed as a single binary, it offers a very clean deployment experience. The hard part is not writing an API wrapper, but handling low-level model loading and memory management efficiently. It supports hot model swapping and auto-discovery, meaning you can switch between different local models without restarting the service. For developers tired of Python dependency issues who want a lightweight, high-performance local inference backend, this is a strong option.

View on GitHub