Shimmy v1.9.0: Einzelne 4,8 MB Binärdatei unterstützt alle GPU-Backends für lokale LLM-Inferenz

Shimmy v1.9.0 ships as a 4.8MB Rust binary that auto-detects CUDA, Vulkan, OpenCL, or CPU backends — 142x smaller than Ollama — with a 'free forever' MIT license pledge and sub-100ms startup.

1 Min. Lesezeit|agenticonsult Intelligence

Shimmy v1.9.0: Single 4.8MB Binary Runs All GPU Backends for Local LLM Inference

Shimmy v1.9.0 has been released as a "kitchen sink" build: a single Rust binary per platform (Windows/Linux x64 and macOS ARM64) that auto-detects and uses CUDA, Vulkan, OpenCL, or CPU at runtime. The 4.8MB binary is claimed to be 142× smaller than Ollama (680MB) with sub-100ms startup. The release adds MoE CPU offloading for running 70B+ models on consumer VRAM by distributing Mixture-of-Experts layers across GPU and system RAM. MIT-licensed with an explicit "free forever, never paid" pledge. Featured on Hacker News front page twice.

Why It Matters

Shimmy's single-binary approach eliminates the compilation and backend-selection friction that blocks mid-level developers from running local LLMs. Combined with zero-config model auto-discovery from HuggingFace, Ollama, and local directories, it represents the clearest attempt yet to make local inference as frictionless as pip install.

Diskutieren aufLinkedIn X

Diese Eilmeldung wurde mit AI-Unterstuetzung aus der genannten Primaerquelle zusammengestellt. Sie dient der schnellen Lageorientierung — fuer die massgebliche Aussage bitte die Originalpublikation konsultieren.

Shimmy v1.9.0: Einzelne 4,8 MB Binärdatei unterstützt alle GPU-Backends für lokale LLM-Inferenz

Shimmy v1.9.0: Single 4.8MB Binary Runs All GPU Backends for Local LLM Inference

Why It Matters

Live News Feed