Superlinked Open-Sources 'Sie' Small-Model Inference Engine

Superlinked has released Sie (Superlinked Inference Engine) as open source — a production-grade inference system for small embedding, reranker, and NER models. Sie hot-swaps multiple models on a single GPU with LRU eviction (addressing the 80%+ GPU waste when provisioning a full GPU per small model), implements per-family forward passes for BERT, Qwen, ColBERT, and ModernBERT, adds variable-length flash attention to eliminate padded-token compute waste, and ships with KEDA auto-scaling on Prometheus metrics. It integrates with Chroma, Qdrant, Weaviate, and LanceDB.

Why It Matters

Sie addresses "context rot" — the quality degradation that occurs as context length grows — by enabling cost-effective small-model preprocessing pipelines that reduce the agent's working context before expensive LLM inference. Full demo presented at AI Engineer conference.