12 articles

#local-ai

Prism ML Bonsai 4B: Ternary Image Model Runs at 3.7 GB

Prism ML's Bonsai Image 4B retrain cuts image generation memory from ~13 GB to ~3.7 GB at ~95% quality — viable local image gen on a MacBook in under 5 seconds.

May 30, 20261 min read

Researchbreaking

Prism ML Ships Ternary Flux 2 Klein 4B: 7.7GB Collapsed to 1.2GB

Prism ML collapses Flux 2 Klein 4B from 7.7GB to 1.2GB via ternary quantization with claimed 95% benchmark retention — but real-world text rendering and product mockup quality degrade significantly.

May 27, 20261 min read

Technologybreaking

llama.cpp Ships WebGPU Backend: Full Browser-Based GPU Inference, No Install

llama.cpp ships a WebGPU backend enabling GPU-accelerated LLM inference directly in any modern browser — no data leaves the device, no install required, 18 months in development.

May 27, 20261 min read

Toolsbreaking

Sulphur 2: Uncensored Open-Source Video Model Generates 10s Clips on 16GB VRAM

Sulphur 2: uncensored open-weights video model on LTX base, 10s/24fps clips on 16GB VRAM, 125K+ training videos. Available on HuggingFace via ComfyUI/Pinokio.

May 14, 20261 min read

Technologybreaking

GGUF Ecosystem Hits 176K Models; Monthly Growth Nearly Doubled Since March

GGUF local models on Hugging Face hit 176K with monthly creation rates doubling since March — local AI adoption has crossed an inflection point.

May 10, 20261 min read

Toolsbreaking

Aiden Open-Sources Local AI OS: 1,500 Skills, No Cloud Required

Aiden open-sources a local AI OS for Windows/Linux: 1,500+ skills, 89+ tools, 6-layer knowledge graph memory, subagent swarms, voice, Discord/Telegram — Ollama-backed.

May 3, 20261 min read

Industryreport

The Subscription Crisis: Compute Economics Are Forcing Cloud AI Into a Per-Token Future

Cloud AI's flat-rate subscription model is breaking under compute pressure. Six signals in 48 hours show the forced shift to per-token billing has begun.

April 30, 202610 min read

Technologybreaking

Gemma 4 Powers Fully Local Browser Agent via WebGPU — No Server, No API Key

Gemma 4 E2B + WebGPU + Transformers.js enables a fully local Chrome browser agent with no server calls — tabs and browsing data stay on-device.

April 29, 20261 min read

Technologybreaking

Hugging Face Forms Dedicated PyTorch/MPS Team for Apple Silicon

Hugging Face forms a dedicated PyTorch/MPS team targeting 100× Apple Silicon perf gains — torch.sort and torch.multinomial are already MPS-native; flex attention is next.

April 25, 20261 min read

Technologybreaking

llama.cpp Hits 100K Stars; Creator Predicts 90% of Agents Will Run Locally

llama.cpp hits 100K GitHub stars; creator @ggerganov predicts 90% of AI agents will run locally within 3–6 months as local model quality crosses the agentic threshold.

April 25, 20261 min read

Technologybreaking

Qwen3.6-27B: 27B Model Claims to Beat 397B MoE on All Coding Benchmarks

Qwen3.6-27B (Apache 2.0) claims to outperform the 397B Qwen3.5 MoE and Claude Opus 4.5 on coding benchmarks, running locally on 18GB RAM.

April 23, 20261 min read

Technology

Qwen3.6-27B Surpasses a 397B Model on Coding Benchmarks

Alibaba's Apache 2.0 27B model outperforms Qwen3.5-397B-A17B on all major coding tasks and runs locally on 18 GB RAM — 'bye bye subscription era' claims are spreading.

April 23, 20262 min read

AI Intelligence Newsletter

Curated AI insights — sent when there's something worth your inbox.