Virginia Tech Preprint Challenges Skill-MD Paradigm with Model-Native Training

A Virginia Tech preprint (April 19, 2026) argues that the skill-MD / design-MD approach being actively promoted by Anthropic and Google is a local maximum. Using sparse autoencoders to extract "model-native skills" — the latent axes of behavioural variation the model built during pretraining — and then using those to select SFT data, the authors outperform the best human-curated skill-based fine-tuning on both Llama-3 8B and Qwen 2.5 3B.

What the Source Actually Says

The direct SFT gains are modest but consistent: Llama-3 8B improves from 38.4% (best human-skill SFT) to 39.6% with model-native SFT; Qwen 2.5 3B from 56% to 57.7%. The larger result is in data selection: by projecting candidate training examples into the model's activation space and prioritising directions where the model is currently weakest — a "representation error curriculum" — the authors achieve 20% gain on MATH-1 and 41% gain on AMC with significantly fewer examples than textual diversity selection would require.

The pipeline, documented in a public (anonymous) GitHub repository, runs in four steps: extract residual-stream activations on reasoning examples → discover latent skill axes via sparse autoencoder → steer the model toward identified directions → configure SFT. The tool's key implication is that a human "skill" is a trajectory through activation space assembled from multiple atomic unit vectors — and a skill-MD file is real engineering, but it's compensating for weak underlying activation primitives rather than strengthening them.

A parallel finding with implications for safety alignment: for jailbreak defence, native directional coverage outperformed textual diversity of adversarial prompts. The taxonomy-mismatch problem extends beyond capability into safety. The paper's authors also propose "steering vectors as zero-token-cost system prompts" — averaging activations of successful outputs and injecting at inference time, replacing meta-prompts with no context overhead.

Strategic Take

This doesn't invalidate current skill-MD investment — Herk's production Playwright automation skills ship today and compound across sessions regardless of what's happening at the fine-tuning layer. The question for teams building durable AI products is whether scaffolding investment produces diminishing returns as representation-engineering tooling matures over the next 12–18 months. Following the VT repo and the SAFNO (sparse-autoencoder neural operators) line of work is the cheapest hedge.

Virginia Tech Preprint Challenges Skill-MD Paradigm with Model-Native Training