Researchbreaking
MILKYWAY Paper: Prediction Harness Scores 61% vs 44% for Raw GPT-5.4 + Web Search
MILKYWAY freezes GPT-5.4 and moves all learning into editable skill.md files rewritten by a harness editor agent — scoring 61% vs 44% on future-prediction benchmarks.
April 23, 20261 min read