Machina

@EXM7777

1mo ago

how to get Fable-level intelligence back:

for a few days, we had something that felt like AGI...

Fable 5 showed up, effectively unlimited inside the plans, and the ceiling on what you could build lifted overnight

but then Anthropic killed it, maybe forever

meanwhile, Opus 4.8, the model you're left with, feels lobotomized by comparison, like someone turned the intelligence down

most people just accepted the downgrade and moved on

they didn't have to

you don't need Anthropic to switch Fable back on to get that level of intelligence again

you can rebuild it yourself, out of cheaper models, and run it on demand

the trick is a council: one frontier model coupled with a few cheaper or open ones, answering together

it lands at Fable-level quality for a fraction of the price, and it beats any single model you're paying for right now, for a reason most people get backwards

the reframe: hire a team, don't rent a genius

almost everyone uses ai like this right now:

> a hard question comes up
> you hand it to the single most expensive model you have access to
> you read its one answer
> you trust it, because what else would you check it against

one model gives you one opinion, with no second set of eyes on its blind spots

the council is built to change that

you send the same question to a panel of models at once, then a separate model reads every answer, notes where they agree, where they contradict each other, and what each one missed, and writes the single best version

and that combined answer beats the best model in the room working alone

a two-model panel with a judge scored 69 out of 100 on openrouter's own published reasoning test, where Fable 5, the strongest single model in it, scored 65.3

a team of three cheap models, refereed, landed at 64.7, about a point under Fable 5 itself, at roughly half the cost per question

the quality didn't come from buying a smarter model

it came from one model carefully reviewing and combining the work of several cheaper ones

checking the work matters more than adding workers, and that single fact is the whole reason a council costs less and answers better

if you want to learn how to squeeze the most out of model councils, and how to turn them into real money, that's what the real time AI ops community is built for: weeklyaiops.com

why a council is the right shape for orchestration, not just answers

it's easy to read the above as a nice trick for getting better answers

it's bigger than that

the bigger payoff is what a council does for orchestration, the act of running a long, multi-step job

because orchestration is really a string of decisions

what's the plan, what order, what's risky, what do we do when step four contradicts step two

and good decisions have never come from one mind

they come from several perspectives weighing pros and cons against each other until the strongest call survives

that is exactly what a council is

so when you put a council in charge of a big task, you get a better decision-maker steering everything downstream

and that reframes how you spend your one frontier model

don't waste it as just another voice on the panel... put it in the judge's seat

let the cheap and open models do the parallel drafting, and spend your expensive model on the step that carries the quality: reading everything and making the call

one frontier judge, a panel of cheaper models under it

that's the build everyone should run

one trap before the options: a router is not a council

the moment this gets popular, everything will get rebranded as "fusion," so learn the line now

the thing it gets confused with most is a router

a router picks one model for you, usually the cheapest one that can handle the request, to save money

an eval tool runs many models and grades them side by side

neither one fuses anything

you still walk away with one model's answer, or a scorecard

a council is defined by the judge, the model that reads all the answers and writes a better one than any single member produced

no synthesis step and no council, just a switchboard

keep that test in your pocket, because some products marketed as fusion are routers with a fusion feature bolted on the side

the options, and how you pay for them

none of this has to be built from scratch

the choice comes down to one question: use the subscriptions you already pay for, or pay per request

ent it, per request: the fastest way to feel a council is openrouter fusion

there's a browser page with three buttons, quality, budget, and custom, and the budget button is the half-price team

zero setup, paste a hard question, read the fused answer

if you want to own the whole graph instead of a preset, orcarouter lets you define the panel and the judge yourself in a config file, with named strategies for how the judge decides

both bill you per call, which is clean if you don't want a stack of subscriptions

use what you already pay for: if you live inside a coding agent and already pay for Claude, ChatGPT, and Gemini, gavel runs all three in parallel on the same task and has Claude fuse their answers, using your existing logins instead of new api keys

only the main model touches your files, the others advise read-only, which keeps it safe to run on real code

run your own: the council as a tool you control end to end

> openfusion, one command line tool that runs any backend (paid apis, local open models, your coding agent) under strategies like consensus (the panel has to agree), best-of-n (keep the single strongest answer), or first-to-finish (take whichever lands first), and can even expose the whole council as a single endpoint anything else can call

> fusion-fable, the closest self-run cousin to openrouter's version, a blind panel of frontier models with a strong model judging and writing the final answer

https://github.com/duolahypercho/fusion-fable

> llm-consortium, the most refined of the bunch, with a judge that keeps re-running the panel until the answers converge on a confidence threshold:

github.com/irthomasthomas/llm-consortium

start with the browser page to feel it, then graduate to gavel or openfusion once you know you want it in your daily workflow

when to convene the council

a council is slower and pricier per call than a single model, because you're waiting on a whole panel plus a synthesis pass

a default fusion call can run several times the cost and the latency of one model answering once

so this is not your everyday default, it's the tool you reach for when the decision is worth it

the gut check: would i have paid for a premium model to answer this

if yes, it's a council question, if no, use your cheap fast model and move on

the places a council earns its cost:

codebase migration: this is the clearest case, and it also clears up a myth

picture moving an old monolith over to microservices: dozens of judgment calls about order and risk land before a single line of code gets written

a council that synthesizes prose is genuinely not the tool for writing that code, because there's no objective referee for what counts as the best code

but migration is barely a coding problem in the first place, it's a decision problem: what's the order, what breaks, what's the risk, what's the rollback

you convene the council to produce the migration plan and weigh the risks, then you hand the actual edits to a single cheaper agent to execute step by step

the council decides, the lighter agent does

deep research: say you're weighing a real decision, like which database to commit to or whether to move off a vendor, the panel each pull and read sources in parallel, the judge reconciles where they disagree, and disagreement is exactly where the real answer hides, so you get a checked answer instead of one model's confident guess

orchestrating a long, complex job: something like reorganizing a sprawling project that grew without a plan, the council sets the plan and re-decides at each fork, a cheaper agent runs the steps in between

setting goals for a lighter agent: use the council to turn a vague intention into a sharp, sequenced brief, then let a cheap fast model execute against it, you're spending the expensive thinking only where thinking happens

building a knowledge base: say a tracker of every competitor in your market, the council decides the structure, what's an entity, what's a duplicate, how it all connects, and a lighter agent fills it in underneath, so the expensive judgment shapes the system and the cheap labor populates it

the same shape runs under all five

the council never does the grunt work, it makes the decisions, and a cheaper model carries them out

the nuance that proves it's a pattern, not a product

the loud version of this online is that fusion is brilliant at research and useless at coding

that's a competitor's launch line, not a measured result, so don't take it as fact

the true and far more useful version is this: synthesizing prose is the wrong way to judge code, because writing a nice summary of three code answers doesn't tell you if the code runs

so for code, you change the judge

instead of fusing prose, the council keeps the candidate whose patch passes the tests

same council, objective referee

that's the real lesson, a council is a pattern you tune per task, which is why no single product gets to own it

prose and research get a synthesizing judge

code gets a judge that runs the tests

you pick the referee that fits the job, and the same simple shape, panel plus judge, carries all of it

the build, ready to run

> stop renting one expensive model for every hard call, convene a council

> the shape: a panel of models answers in parallel > one judge reads all of them > it writes the single best answer

> the build most people should run: cheap and open models on the panel, your one frontier model in the judge's seat, because the judge is where the quality lives

> the test: no synthesis step means it's just a router

> rent it per request: openrouter fusion (browser, zero setup) or orcarouter (own the graph)

> use your subscriptions: gavel (Claude + Codex + Gemini via your existing logins)

> run your own: openfusion, fusion-fable, or llm-consortium

> when to convene it: migrations, deep research, long multi-step jobs, briefing a lighter agent, building a knowledge base

> the rule under all of it: the council decides, a cheaper agent executes

> for code: the judge runs the tests instead of synthesizing prose

the close

the choice in front of you isn't which model is smartest anymore

it's whether you keep renting one genius for every question, limits draining, bill climbing

or you hire a team for the decisions that carry real weight, and let a cheap fast model handle the rest

one frontier judge, a panel of cheaper models, and a clear rule for when it's worth convening

the people who win the next stretch of AI won't be the ones paying the most for the smartest single model, they'll be the ones who learned to convene the room

the two build guides this runs on, the half-price council in your browser and the lightweight agent that executes underneath it, are both waiting in the community at weeklyaiops.com

X Article

Found something good?