Prism ML Bonsai 4B: Ternary Image Model Runs at 3.7 GB
Prism ML's Bonsai Image 4B retrain cuts image generation memory from ~13 GB to ~3.7 GB at ~95% quality — viable local image gen on a MacBook in under 5 seconds.
Prism ML's Bonsai Image 4B retrain cuts image generation memory from ~13 GB to ~3.7 GB at ~95% quality — viable local image gen on a MacBook in under 5 seconds.
Prism ML collapses Flux 2 Klein 4B from 7.7GB to 1.2GB via ternary quantization with claimed 95% benchmark retention — but real-world text rendering and product mockup quality degrade significantly.
llama.cpp ships a WebGPU backend enabling GPU-accelerated LLM inference directly in any modern browser — no data leaves the device, no install required, 18 months in development.
Sulphur 2: uncensored open-weights video model on LTX base, 10s/24fps clips on 16GB VRAM, 125K+ training videos. Available on HuggingFace via ComfyUI/Pinokio.
GGUF local models on Hugging Face hit 176K with monthly creation rates doubling since March — local AI adoption has crossed an inflection point.
Aiden open-sources a local AI OS for Windows/Linux: 1,500+ skills, 89+ tools, 6-layer knowledge graph memory, subagent swarms, voice, Discord/Telegram — Ollama-backed.

Cloud AI's flat-rate subscription model is breaking under compute pressure. Six signals in 48 hours show the forced shift to per-token billing has begun.
Gemma 4 E2B + WebGPU + Transformers.js enables a fully local Chrome browser agent with no server calls — tabs and browsing data stay on-device.
Hugging Face forms a dedicated PyTorch/MPS team targeting 100× Apple Silicon perf gains — torch.sort and torch.multinomial are already MPS-native; flex attention is next.
llama.cpp hits 100K GitHub stars; creator @ggerganov predicts 90% of AI agents will run locally within 3–6 months as local model quality crosses the agentic threshold.
Qwen3.6-27B (Apache 2.0) claims to outperform the 397B Qwen3.5 MoE and Claude Opus 4.5 on coding benchmarks, running locally on 18GB RAM.
Alibaba's Apache 2.0 27B model outperforms Qwen3.5-397B-A17B on all major coding tasks and runs locally on 18 GB RAM — 'bye bye subscription era' claims are spreading.
Curated AI insights — sent when there's something worth your inbox.