Toolsbreaking
Shimmy v1.9.0: Single 4.8MB Binary Runs All GPU Backends for Local LLM Inference
Shimmy v1.9.0 is a 4.8MB single-binary OpenAI-compatible local inference server that bundles all GPU backends and claims 142x size advantage over Ollama.
April 29, 20261 min read