Source: Machina (@EXM7777) on X. Distilled and processed for Signal; not original content.
TL;DR
New research: a tiny model, small enough to run on a laptop, that answers nothing itself and only acts as a manager, deciding which big model handles each piece of a task. It beat ChatGPT, Gemini, and Claude individually and set a record on a hard coding benchmark. Swapping a top-tier model into the manager seat made results worse. Knowing who to ask beats being the smartest in the room.
How the manager runs one task
- Picks the best model to plan the work.
- Picks the best model to do the work.
- Picks the best model to check the work.
- Loops until the checker signs off.
The manager is a conductor: it cannot play an instrument, but it knows exactly who should play when.
Actionable items
- Stop betting everything on one “best” model. A team of models beats any single one.
- Never let a model check its own work. It has the same blind spots it had when it wrote the output, so it waves its own bugs through. Use a different model to check.
- Build a “check” step before you ship anything. It quietly kills most bad outputs.
- Routing beats raw intelligence: a small router outperformed frontier models as the manager, and putting a top-tier model in the manager seat did worse.
Key quotes
being the smartest in the room doesn’t matter, knowing who to ask does
the future isn’t one giant brain, it’s a small manager pointing the right models at the right jobs