Sakana Fugu Review: Orchestration Hedge, Not Sovereign AI

Last updated: June 23, 2026

A document-and-architecture audit of what Sakana Fugu actually sells, the orchestration cost you cannot forecast before a run, and who should skip it.

Sakana Fugu is an OpenAI-compatible API from Sakana AI K.K. that runs a multi-agent orchestration system behind one endpoint. Rather than serving a single model, it routes work across a pool of external frontier models and, in Sakana’s own description, handles model selection, delegation, verification, and synthesis internally. Two variants ship, Fugu and Fugu Ultra, both through the same API. General availability began on June 22, 2026.

The timing is the whole pitch. Fugu went generally available ten days after Anthropic said a US export-control directive required it to cut off Fable 5 and Mythos 5 for any foreign national, inside the United States or outside it. To comply, Anthropic disabled both models worldwide. Its other models kept running. Sakana’s framing points straight at that: build on Fugu, the argument goes, and one vendor losing access does not take your whole stack down with it.

Read Sakana’s own Terms of Service, though, and the hedge gets narrower. The Terms describe Fugu as routing input to external machine learning models, and the examples they name are OpenAI, Anthropic, and Google. Every external provider Sakana names, and every public model it benchmarks against, is US-jurisdiction. The full active pool is not disclosed, and the per-request routing is, in Sakana’s own words, not exposed by design. The same Terms put US export-control and sanctions compliance on you, the user, and place anyone outside Japan under California law. So the hedge spreads your dependency across providers you cannot fully enumerate, in the same jurisdiction whose export-control reach produced the Fable suspension, governed by terms that hand the compliance risk back to you. The gap between the word sovereignty and what those documents describe is the seam this review is built on.

The buyer facts, ahead of the 48-hour pricing recheck Sakana’s volatile rates demand: subscriptions at $20, $100, and $200 a month, sold as relative usage multipliers with no published token cap; pay-as-you-go Fugu Ultra at $5 per million input tokens and $30 per million output; orchestration tokens billed into the final price; and routing that Sakana does not expose. Fugu is not available in the EU, the EEA, the UK, or Switzerland.

Briefing summary, June 2026

Future Stack Reviews summary infographic, part 1 of 2: what happened with Sakana Fugu (the June 2026 Anthropic export cutoff and the Fugu launch), what Fugu is (a managed orchestration layer over an undisclosed external model pool), and key facts on pricing, availability, routing, and legal terms. Details in the sections below. — Sakana Fugu in one frame It is a vendor cutoff hedge built as a managed orchestration layer over a pool of external models that Sakana only partly names sold with the pricing availability and legal terms that decide whether it fits you Pricing and availability shift fast so confirm both before you budget or build

What happened. On June 22, 2026, Sakana AI released Fugu and Fugu Ultra, a multi-agent orchestration system delivered through one OpenAI-compatible API. It is pitched as frontier-level capability that survives any single vendor being cut off, with explicit reference to the June 12 export-control suspension of Anthropic’s Fable 5 and Mythos 5.

Who it is for. Teams building complex coding, reasoning, or research agents that want one managed endpoint instead of running their own multi-agent harness, on work that is not regulated or sensitive.

Who it is not for. Anyone in the EU, EEA, UK, or Switzerland, where it is blocked. Teams handling regulated or personal data. Anyone who needs per-request model provenance, audit logs, or precise cost forecasting. Anyone seeking local or data-sovereign AI.

The finding. Fugu is an orchestration hedge, not sovereign AI. It reduces the impact of one frontier vendor being cut off by putting Sakana’s routing layer in front of a pool of external models. Every provider Sakana names is US-jurisdiction and the full pool is undisclosed, so the hedge does not move capability out of that jurisdiction. It also reduces, rather than increases, your visibility into which model ran, what the request cost, and how the data flowed.

Review tier and disclosure

TIER C · DOCUMENT AND ARCHITECTURE REVIEW

This is a document-and-architecture review, not a hands-on benchmark. Future Stack Reviews has not yet run Fugu through a paid production workload. Everything below is drawn from Sakana’s public pages, its Terms, Privacy, and Usage policies, its pricing and developer documentation, the two research papers it cites, and Anthropic’s export-control statement. Performance, latency, quota burn, provider-exclusion behavior, and real task-level cost all need hands-on testing, which is planned as a Tier B follow-up (agenda in the methodology section).

TL;DR

What it is: one OpenAI-compatible API that orchestrates a pool of external frontier models instead of serving a single one. Two tiers, Fugu and Fugu Ultra.
The real product: managed orchestration labor. You delegate routing, delegation, and synthesis to Sakana and get one endpoint plus partial resilience if any single provider is cut off.
The catch: Sakana states the per-request routing is “not exposed by design.” You cannot see which model served a call. On Fugu Ultra you cannot opt providers out at all.
The sovereignty claim: oversold. Fugu’s own Terms name OpenAI, Anthropic, and Google as the external models, push US export-control compliance onto the user, and transfer data to Japan and the United States. This is availability resilience, not data or jurisdictional sovereignty.
Benchmarks: against the public baselines Sakana lists, Fugu Ultra posts the top or tied-top score on most tests, verified June 2026. It does not sweep them, and on several tasks the cheaper base Fugu beats the flagship Ultra.
Cost: sticker price is visible; the absolute usage cap is not, and orchestration tokens are billed into the final price, so cost is easier to audit after a run than to forecast before one.
Hard stop: unavailable in the EU, EEA, UK, and Switzerland today.

Quick start for developers

Base URL	`https://api.sakana.ai/v1`
Models	`fugu` and `fugu-ultra-20260615`
Endpoints	Chat Completions (`/v1/chat/completions`), Responses API (`/v1/responses`, recommended), Models API
Built-in tool	`web_search`, available through the Responses API
Reasoning effort	Two levels documented: `high` and `xhigh`
Usage reporting	Returns a custom `token_details` field (orchestration tokens), which is non-standard versus OpenAI
CLI integration	Codex CLI via a `~/.codex/fugu.json` config; the official sample sets `timeout=120`

Not documented at launch: rate limits, streaming behavior, structured-output support, and how internal retries are billed. FSR did not test these.

Fugu speaks the OpenAI wire format, so a one-line base-URL swap will move most existing OpenAI client code over. The catch is the custom token_details field. Standard cost-tracking tooling that expects only OpenAI-shaped usage objects may not read orchestration tokens correctly without adjustment. FSR flags that as a likely integration gap rather than a tested fact.

At a glance: key facts

Product	Multi-agent orchestration delivered as one OpenAI-compatible API
Vendor	Sakana AI K.K. (Tokyo, Japan)
Models	Fugu and Fugu Ultra (Ultra model ID: fugu-ultra-20260615)
General availability	June 22, 2026
Subscription pricing	$20 / $100 / $200 per month, sold as 1x / 10x / 20x usage. Absolute token or job cap not disclosed. Verify within 48 hours.
Pay-as-you-go (Fugu Ultra)	$5 / M input, $30 / M output, $0.50 / M cached input. Above 272K context: $10 / $45 / $1.00. Standard Fugu per-token rate not published.
Regions available	Japan, the United States, and most regions outside Japan except where locally blocked
Regions blocked	EU, EEA, UK, Switzerland
Per-request routing	Not exposed, by Sakana’s design
Provider opt-out	Available on standard Fugu through the console. None on Fugu Ultra (fixed pool).
Data transfer	Includes Japan and the United States, per the Privacy Policy
Training on your content	Permitted by the Terms for training, evaluation, and improvement; console opt-out, not retroactive (default state not verified)
Governing law	Japan users: Japanese law, Tokyo District Court. Users outside Japan: California law, AAA arbitration in Los Angeles, with class-action and jury waivers.
SLA / uptime	None stated. No uptime or response-time guarantee.
Sensitive / personal data	Prohibited as input under the Terms

What happened, and who it touches

Start with the trigger, because Fugu’s marketing depends on it. On June 12, 2026, Anthropic published a statement that the US government had issued an export-control directive, citing national security, to suspend all access to its Fable 5 and Mythos 5 models by any foreign national, whether inside or outside the United States, including Anthropic’s own foreign-national staff. Anthropic disabled both models for every customer to comply. The stated basis was an alleged Fable 5 capability, with Anthropic noting the same capability is available from other models and saying it disagreed with the recall while following it. The directive arrived at 5:21 pm ET, and access was cut off abruptly.

That is the event Sakana names. Fugu’s promise is that a single vendor going dark, the way Fable did, should not be able to take your application down, because the orchestration layer can route around the gap. For a buyer who watched a frontier model vanish in an afternoon, that is a real and rational fear to address.

Now the question that decides whether Fugu is for you at all.

Who it is relevant to. Builders of complex agentic systems: coding agents, multi-step research agents, and reasoning pipelines where you would otherwise stitch together several providers, write your own router, and maintain a verification layer. If you want that orchestration handled for you behind one endpoint, and your workload is not regulated or sensitive, Fugu is aimed squarely at you.

Who it is not relevant to. Three groups can stop reading after this paragraph. If you operate in the EU, EEA, UK, or Switzerland, Fugu is not available to you, full stop. If you handle regulated or personal data, Fugu’s own Terms forbid that input, so it is the wrong tool by design. And if your requirement is to know which model handled each request, to keep an audit trail, or to forecast cost precisely before a run, the product’s opacity is structural and will not suit you.

What the impact is, beyond the launch. The interesting part is not the model. It is the category. Fugu is one of the clearest signals yet that “orchestration” is becoming a product you buy rather than a system you build. The base model is no longer the unit of sale. The unit of sale is the routing, delegation, and verification layer on top of other people’s models. That shift moves the trust boundary. You are no longer trusting one model vendor with your prompt. You are trusting an intermediary to decide, invisibly, which vendors see it and how the work is split. That is a different procurement question, and most buyers have not adjusted to it yet.

What Fugu actually sells

It helps to be precise about the product, because the marketing word and the mechanism point in different directions.

Fugu does not sell you a new standalone frontier model. By Sakana’s own account, it sells managed orchestration over a pool of existing external models. The system selects which model handles a piece of work, delegates sub-tasks, runs a verification pass, and synthesizes a final answer. Sakana grounds this in two papers it published or co-authored, TRINITY and Conductor, both slated for ICLR 2026. The Conductor paper describes a 7 billion parameter reinforcement-learned conductor that decides delegation. Note the boundary carefully: that figure describes the research artifact, not the shipping product. Sakana has not published the conductor size inside the live Fugu service, so treating Fugu as “a 7B model” would be wrong.

The honest read is that the orchestration itself is a difficult engineering problem, and the public benchmark numbers (covered below) show it producing strong results against publicly available models. The product work is credible: model selection, delegation, verification, and synthesis are real engineering, not a wrapper. The problem is the second label Sakana puts on the box.

The sovereignty seam: the blind spot

Here is the part the launch-day discourse is mostly missing. The early argument splits between two camps. One asks whether Fugu beats Fable on benchmarks. The other asks whether it is “just a router.” Both miss the structural move, which is a quiet redefinition of one word.

Sakana borrows “sovereignty” from a moment of real supply shock. When buyers hear “AI sovereignty,” they tend to hear something specific: control over where their data goes, which legal jurisdiction governs it, and whether the system can be audited. Fugu delivers something narrower, and that narrower thing is real. If one frontier vendor is cut off, the endpoint keeps working because the orchestration layer can lean on others. Call that supply resilience, or availability resilience. It is a legitimate benefit.

Data sovereignty and jurisdictional independence are a different claim, and Fugu’s own documents do not support it. The Terms name OpenAI, Anthropic, and Google as examples of the external models the service routes to. All three are US-jurisdiction providers, as is every public model Sakana benchmarks against, and the full active pool is not disclosed. The same Terms place US export-control and trade-sanctions compliance on the user, prohibit use by sanctioned parties, and forbid submitting export-restricted content. The Privacy Policy says data is transferred internationally, including to Japan and the United States. Users outside Japan are governed by California law.

So follow the logic to its end. The event that made Fugu’s pitch compelling was an export-control action that reaches a US-jurisdiction capability accessed remotely. Every provider Fugu names sits inside that same jurisdiction, and Sakana does not disclose the rest of the pool. The hedge therefore addresses the symptom, not the cause. It protects against one vendor losing access, while the authority that can order such a loss still sits underneath the providers you can see and, for all a buyer can verify, the ones you cannot.

The trade in one line. You reach for “sovereignty” and you receive vendor redundancy with less visibility than you had before, across providers whose named members all sit inside the same jurisdiction whose export-control reach created the shock in the first place. That can still be a good deal for availability. It is not the deal the word implies.

This is the FSR finding. Fugu reorganizes frontier-model dependency behind one endpoint. It may improve integration and availability resilience. It does not prove data sovereignty, jurisdictional independence, or per-request auditability, and on the last point it actively reduces what you can see.

Routing, opacity, and three consequences

The mechanism that makes the finding bite is one sentence in Sakana’s FAQ: the routing information is “not exposed by design.” Which underlying model handled a request, and how the work was coordinated, are treated as proprietary and are not surfaced to you.

For a buyer who wants one managed endpoint and nothing more, that may be acceptable. For a buyer with governance, cost, or verification requirements, it produces three distinct consequences.

Consequence one, the sovereignty consequence. If you cannot see which external model processed a request, you cannot make a clean data-residency or processor-audit claim about it. The opacity that makes the product convenient is the same opacity that undercuts the sovereignty framing.

Consequence two, the cost consequence. Because routing and the orchestration steps are hidden, you cannot easily predict how fast a real agentic workflow will consume a subscription, or how many orchestration tokens a given task will add. You see the result after the fact, not the shape of it before you run.

Consequence three, the verification consequence. A benchmark result is produced under Sakana’s routing. Your production call is produced under routing you cannot inspect. The gap between the two means a published score does not cleanly map onto your own workload. The numbers can be real and still not transferable.

There is one meaningful control here, and it is a governance difference worth naming. On the standard Fugu model, you can opt specific providers or models out through the console. On Fugu Ultra, the pool is fixed and there is no opt-out. Read that as an entitlement boundary, not a feature gap: the more you pay for maximum capability, the less control you keep over which providers touch your data.

The benchmark read

Sakana publishes a benchmark table comparing Fugu and Fugu Ultra against three named public baselines: Gemini 3.1 Pro (high), Opus 4.8 (max), and GPT 5.5 (xhigh). The scores below are reproduced from Sakana’s page. Baselines are provider-reported, marked with a dagger. SWE Bench Pro used mini-swe-agent scaffolding, marked with an asterisk.

Benchmark	Fugu	Fugu Ultra	Opus 4.8 †	Gemini 3.1 Pro †	GPT 5.5 †
SWE Bench Pro *	59.0	73.7	69.2	54.2	58.6
TerminalBench 2.1	80.2	82.1	74.6	70.3	78.2
LiveCodeBench	92.9	93.2	87.8	88.5	85.3
LiveCodeBench Pro	87.8	90.8	84.8	82.9	88.4
Humanity’s Last Exam	47.2	50.0	49.8	44.4	41.4
CharXiv Reasoning	85.1	86.6	84.2	83.3	84.1
GPQA-D	95.5	95.5	92.0	94.3	93.6
SciCode	60.1	58.7	53.5	58.9	56.1
τ³ Banking	21.7	20.6	20.6	8.4	20.6
Long Context Reasoning	74.7	73.3	67.7	72.7	74.3
MRCRv2	86.6	93.6	87.9	84.9	94.8

What the table actually says, read straight:

Against the three public baselines, Fugu Ultra posts the top or tied-top score on most of the benchmarks Sakana lists, verified in June 2026. Sakana’s claim that the system surpasses publicly accessible frontier models is broadly true and checkable on coding and reasoning tasks, with the caveat that the table is a live page and the recheck schedule revisits it.

It is not a sweep. GPT 5.5 takes MRCRv2, at 94.8 to Fugu Ultra’s 93.6. And on several tasks (SciCode, the τ³ Banking agentic test, and Long Context Reasoning) the cheaper base Fugu actually outscores the flagship Fugu Ultra, with GPQA Diamond tied between the two. A buyer should notice that, because it means the most expensive option is not uniformly the strongest, and a workload weighted toward those tasks might be better served by the base model.

Then there is the headline that does not appear in this table. Sakana also describes Fugu Ultra as “shoulder to shoulder” with Fable 5 and Mythos 5. Treat that as an official claim, not a verified result. Those two models are export-suspended and not publicly accessible, the comparison is reported as a max-of-two aggregate, and it appears only as a chart image. No one outside Sakana can reproduce it. FSR reports the claim and neither endorses nor rebuts it with specific Fable numbers, because no trustworthy Fable numbers exist to cite.

What the independent research says

Step back from Sakana’s own materials and ask what the broader literature says about orchestration, because that is the only independent check available on the architecture itself.

The peer-reviewed and preprint record supports a conditional conclusion, not a blanket one. Multi-model orchestration can beat the best single model, but only under specific conditions: genuine diversity among the models, learned rather than naive routing, sensible task decomposition, and a verifier that is actually reliable. Where those hold, there are positive results. Lu and colleagues (2023) showed a learned router that beat the best single model on average and ranked first on a large minority of tasks. Wang and colleagues (2024) showed a mixture-of-agents approach outperforming a strong single model on several evaluations.

The cautionary half of the literature is just as load-bearing. Multi-agent debate often fails to beat simple baselines. Orchestration frameworks add real overhead, with one 2026 analysis reporting latency penalties ranging from modest to very large and a measurable drop in planning accuracy under some configurations. Another 2026 system beat its baselines but at roughly ninety seconds per question. And the verifier, the component everything depends on, is itself a known weak point: model-as-judge systems agree with humans well on subjective preference but perform poorly on objective correctness, and they exhibit self-preference bias.

Two things follow. First, the architecture Fugu uses is research-backed in principle, but only conditionally, and it tends to cost latency. Second, Fugu’s specific gains rest on vendor-authored preprints, TRINITY and Conductor, that have not been independently replicated, and no independent Fugu technical evaluation exists in the literature. Sakana’s broader heritage in model composition is real and peer-reviewed, including its 2024 work on evolutionary model merging. But the live product’s specific claims sit outside what independent research has yet validated. That is not an accusation. It is the current state of the evidence.

Cost structure and the orchestration tax

Pricing is where Fugu is simultaneously transparent and opaque, and the two need to be separated.

Transparent: the sticker prices are published. Subscriptions run $20, $100, and $200 per month, described as one times, ten times, and twenty times a baseline usage allowance. Pay-as-you-go Fugu Ultra is $5 per million input tokens, $30 per million output tokens, and $0.50 per million cached input tokens, with each figure roughly doubling once a request exceeds 272K of context. There is a launch promotion: subscribe before the end of July 2026 and get a free second month at your starting tier. One buyer-friendly design point deserves real credit here. For the standard Fugu model, running multiple agents does not stack separate fees. Sakana charges a single blended rate based on the top-tier model involved, rather than billing each agent separately.

Sakana Fugu pricing page: Standard, Pro, and Max subscriptions at , 0, and 0 per month, plus Fugu Ultra pay-as-you-go token rates. — Sakana Fugu pricing page Standard Pro and Max subscriptions at $20 $100 and $200 per month plus Fugu Ultra pay as you go token rates

Opaque: two things you would want before committing are missing. The subscription tiers give you a multiplier, not an absolute cap. You see “ten times” and “twenty times,” not a hard token, request, or job allowance, so you cannot calculate how much real work a plan buys before you buy it. And the standard Fugu per-token rate is not published at all; it is described only as the standard rate for whichever underlying model is used.

Then there is the orchestration tax. Fugu records orchestration tokens in the token_details field, bills them at the standard input and output rates, and counts them in the final price. Sakana is upfront that these are real token usage beyond the visible input and output, and that they reach your bill. That is the right disclosure to make. The practical problem is direction of visibility. Orchestration tokens are easy to audit after a request and hard to forecast before one. On a long agentic loop, where the orchestrator may fan out across multiple steps and models, that overhead could grow in ways the pricing page cannot tell you in advance. How large it grows on a real workload is exactly what a Tier B hands-on test needs to measure, and FSR has not measured it.

One third-party data point, attributed. An independent first-day hands-on by DevelopersIO reported that on a light query, Fugu Ultra showed large orchestration-token counts and long latency, while the base Fugu model showed orchestration fields at zero. FSR treats this as an outside report worth noting, not as its own verified measurement. It points the same direction as the cost concern above, but it is one tester, one query.

A note on the early social signal, kept in its lane. A visible Hacker News thread on launch day included first-hand reports of slow responses and a subscription allowance that ran down faster than expected, alongside comments comparing Fugu to OpenRouter-style routing and home-built multi-agent setups. FSR treats those as early user signals, not verified performance claims. They are useful only as a map of where buyers will test Fugu first: cost predictability, routing transparency, and whether managed orchestration actually beats a direct frontier call. None of it is stated here as fact about the product’s speed or value.

Privacy, training, and legal posture

This section is a procurement assessment, not a legal ruling. FSR is not making any compliance or violation finding. The point is to surface what the documents say and what they leave open.

The Privacy Policy lists what gets collected: prompts, uploaded content, outputs, feedback, session data, timestamps, and request identifiers. Personal data may be disclosed to vendors including the underlying LLM providers, cloud infrastructure, analytics, payment processing, and support. International transfer is stated, including to Japan and the United States. The service is for adults only, and a CCPA addendum exists for California. Retention is described as “as reasonably necessary,” with no fixed schedule published.

On training, the Terms are permissive. They allow the Company to use your content for training, evaluation, and service improvement, with a console opt-out for training use that is not retroactive: content already used may not be reversible. FSR did not confirm the console default state, which is a hands-on check, so treat whether the opt-out ships off by default as an open question rather than a settled fact. The content license is broad, described as worldwide, perpetual, irrevocable, non-exclusive, sublicensable, and transferable. The Terms also state there is no obligation to retain your content, and, separately, no obligation to delete trained model weights, external-vendor caches, or audit logs. Contractors and human reviewers may be involved in some circumstances.

A few more clauses a buyer should weigh. Inputting personal information is prohibited, as is health, financial, and other sensitive information, which by itself tells you Fugu is not built for regulated data. Building a competing AI orchestration or routing product is prohibited. There is no uptime or response-time guarantee, and the Terms explicitly contemplate degradation or suspension caused by an external provider. Credits expire after six months and are non-refundable. The Usage Policy, on the portion FSR confirmed, prohibits processing individuals’ sensitive information without consent, building facial-recognition databases, and real-time biometric identification, requires that AI-generated output be disclosed as such, and bars unauthorized security testing. One area of that policy concerning automated decisions in high-stakes domains is still being re-confirmed against the full text and is therefore not characterized here.

The procurement gap is the headline. FSR found no published Data Processing Agreement, no subprocessor list, no SLA, no fixed retention schedule, and no security certification such as SOC 2 or ISO 27001. The absence is the finding. For a regulated or enterprise buyer, those documents are the entry ticket, and at launch they are not on the table.

EU and UK buyers. Beyond the procurement gaps, the product is simply not offered to you yet. Sakana states Fugu is not available in the EU and EEA while it works toward GDPR compliance, and the Terms exclude the UK and Switzerland from supported regions. Any future EU deployment would also raise questions under the EU AI Act (Regulation (EU) 2024/1689) whose answer depends on the specific use case and needs legal review. FSR makes no classification here.

How Fugu compares to the alternatives

Fugu is not the only way to get multi-model capability, and the right comparison is by tradeoff, not by ranking. Each option below wins on a different axis. The table is meant to help you locate your own constraint.

Axis	Sakana Fugu	Direct frontier API	Router (e.g. OpenRouter)	Self-built multi-agent	Self-hosted / local
What you buy	Managed orchestration over a hidden pool, one endpoint	One model you choose	A routing layer you configure across providers	Your own harness over providers you pick	Weights you run yourself
Routing visibility	None, by design	Full, you pick	You configure it	Full, you define it	Full
Per-request model	Not shown	Known	Shown	Yours to log	Yours
Cost predictability	Sticker visible; orchestration tokens and caps hard to forecast	Per-token, predictable	Per-token per model	Per-token plus your infra	Mostly fixed infra cost
Data path / jurisdiction	External US models (per Terms examples); Japan and US transfer	That one provider	Whichever provider is routed	Providers you choose	Your infrastructure
Export-control exposure	US pool; user must comply; single-vendor cutoff hedged	Tied to that vendor	Depends on providers	You can include non-US or OSS	Lowest external exposure
EU availability	No	Varies (many yes)	Varies	Varies	Yes
Audit / governance	Limited; opt-out on base Fugu only; no DPA found	Provider’s DPA and controls	Provider terms apply	Full, your design	Full
Build / maintenance effort	Lowest; Sakana maintains orchestration	Low	Low to moderate	High; you own the harness	Highest; infra plus ops

The pattern is clean. Fugu’s genuine edge is the bottom row: it removes the work of building and maintaining a multi-agent system. Its genuine cost is the rows above: visibility, jurisdiction control, cost forecasting, and audit. If your binding constraint is engineering time, Fugu argues well for itself. If your binding constraint is governance or cost transparency, the alternatives argue better.

Who should use it, who should not, who should wait

Future Stack Reviews summary infographic, part 2 of 2: who Sakana Fugu is for (complex, non-sensitive agentic work), who it is not for (EU and regulated or sovereignty-bound buyers), who should wait, and the core finding that Fugu is an orchestration hedge, not sovereign AI. Details in the sections below. — The buyer call settled Fugu earns a paid trial on complex non sensitive agentic work where resilience matters more than seeing which model ran or what it cost It is wrong for EU regulated audit and sovereignty needs and a wait for anyone who needs a DPA usage caps or model controls that are not available yet The finding underneath is simple Fugu buys resilience and simplicity not sovereignty

Decision path

Are you in the EU, EEA, UK, or Switzerland? Yes → Fugu is not available to you. Stop here.
Do you handle regulated or personal data? Yes → The Terms forbid that input. Choose a provider with a DPA and data controls. Stop here.
Do you need to know which model handled each request, or keep audit logs? Yes → Routing is not exposed. Use direct APIs or your own harness instead.
Do you need to forecast cost precisely before running? Yes → Orchestration tokens and undisclosed caps make that hard. Test on your workload first, or use a per-token direct API.
Building complex coding, reasoning, or research agents and want to skip building the orchestration yourself? Yes → Fugu earns a paid trial on non-sensitive work. Measure orchestration overhead and latency yourself before you scale.

Use it if you build complex coding, reasoning, or research agents, you would rather pay for one managed orchestration endpoint than build and maintain a multi-agent stack, and your work is not sensitive enough that routing and cost opacity become a problem.

Skip it if you are in the EU, EEA, UK, or Switzerland, if you need per-request model provenance or audit logs, if you handle regulated or sensitive data, if you need deterministic cost prediction, or if what you actually want is local, data-sovereign AI. In that last case, Fugu is the wrong category entirely, though the open-weight route is not the clean escape it looks like either: as FSR found with GLM-5.2 and Kimi K2.7 Code, the hardware floor pushes most teams back onto a vendor API.

Wait if you are a procurement or security team holding out for a DPA, a subprocessor list, an SLA, and audit logs, none of which exist yet. Or if you are cost-sensitive and need to measure real orchestration-token overhead on your own workload before you can trust the bill.

FAQ

Does Sakana Fugu give you data sovereignty?

No, not in the strong sense. Fugu hedges the risk of a single vendor being cut off, but its own Terms route input to external US models named as OpenAI, Anthropic, and Google, transfer data to Japan and the United States, and place non-Japan users under California law. That is availability resilience, not data or jurisdictional sovereignty.

Does Sakana Fugu escape US export controls?

Not structurally. The models its Terms name as examples are US-jurisdiction frontier providers, and the same Terms require you, the user, to comply with US export controls and trade sanctions. Fugu reduces the impact of one vendor losing access. It does not move your capability outside the jurisdiction that can order such a suspension.

Is Sakana Fugu available in the EU?

No. Sakana states Fugu is not yet available in the EU and EEA while it works toward GDPR compliance, and its Terms also exclude the United Kingdom and Switzerland from supported regions. EU and UK buyers cannot use it today, and FSR found no published Data Processing Agreement or subprocessor list.

What does Sakana Fugu cost?

Subscriptions are $20, $100, and $200 per month, sold as one, ten, and twenty times a usage baseline with no published absolute cap. Pay-as-you-go Fugu Ultra is $5 per million input tokens and $30 per million output, higher above 272K context. Orchestration tokens are billed into the final price. Verify within 48 hours, as pricing is volatile.

Does Fugu Ultra beat Fable 5?

Sakana reports Fugu Ultra as shoulder to shoulder with Fable 5 and Mythos 5, but those models are export-suspended and not independently testable, and the comparison appears only as a chart aggregate. Against the public baselines Sakana does list, Fugu Ultra leads most coding and reasoning benchmarks without sweeping them.

Can you see which model Fugu used?

No. Sakana states the per-request routing is proprietary and not exposed by design. On the standard Fugu model you can opt specific providers or models out through the console, but Fugu Ultra uses a fixed pool with no opt-out, and neither mode reveals which model handled a given request.

What are the hidden costs of Sakana Fugu?

The main one is orchestration tokens. Sakana records them in a token_details field and counts them in the final price, so you can audit them after a run but not forecast them before. An independent first-day hands-on reported large orchestration-token counts on Fugu Ultra for a light query. Subscriptions also hide their absolute usage cap.

What are the alternatives to Sakana Fugu?

Three realistic options exist: a single frontier API called directly, a routing layer such as OpenRouter, or your own multi-agent harness across several providers. Each trades off differently on transparency and on the effort to maintain it. Fugu’s pitch is that it removes the build work, at the price of routing and cost visibility.

Methodology and sources

This is a Tier C, document-and-architecture review. Future Stack Reviews did not run a paid production workload against Fugu. The findings come from Sakana’s public materials and primary policy documents, the two research papers Sakana cites, Anthropic’s export-control statement, a synthesis of the independent academic literature on multi-model orchestration, and a clearly attributed set of third-party and early-user signals that are not treated as verified product behavior.

Primary sources read:

sakana.ai/fugu (product page, benchmark table, FAQ)
Terms of Service, effective June 12, 2026
Privacy Policy
Usage Policy (partially confirmed; one section pending full re-read)
Console pricing and developer documentation (get-started and models)
Anthropic, Fable 5 and Mythos 5 access statement
TRINITY (arXiv 2512.04695) and Conductor (arXiv 2512.04388)
A Fugu technical report PDF exists in Sakana’s GitHub repository; FSR confirmed its existence but did not extract or rely on its contents

What FSR did not test. Real latency (time to first token, p50, p99), quota burn on a sustained workload, actual per-task cost ratios, the true magnitude of orchestration-token overhead, the quality impact of opting providers out, whether per-request provider attribution can be surfaced by any means, and the undocumented behaviors around rate limits, streaming, and structured outputs. Every one of those needs hands-on access.

Tier B follow-up plan. When paid access is set up, FSR will: quantify the orchestration-token overhead ratio (visible tokens versus billed) on a real coding-agent workload with screenshots; test whether the model that handled a request can be surfaced at all; reproduce a build task through Codex or Cursor and check it against the public-baseline benchmark claims; measure latency against a direct frontier call; and test what opting a provider out does to output quality.

Recheck discipline. Pricing and availability are high-volatility and must be re-verified within 48 hours of publication. The export-control situation is also live: at the time of writing, the Fable and Mythos suspension was being disputed, and Anthropic had stated it disagreed with the recall while complying. Independent reporting and policy analysis in mid-June framed the export action as a novel and legally uncertain application of export control to remotely accessed AI, which is the broader context for Sakana’s pitch.

FSR verdict

Sakana built something substantial. The orchestration is a hard engineering problem, the benchmark leads against publicly available models are checkable and mostly hold, and the no-fee-stacking pricing on the base model is an honest design choice. If your problem is that you do not want to build and babysit a multi-agent system, Fugu is a credible way to skip that work, and it deserves a paid trial on non-sensitive workloads.

The thing to discount is the second label. “AI sovereignty” oversells what Fugu does. The product hedges one vendor being cut off by spreading your dependency across a pool of US-jurisdiction models you cannot see into, governed by terms that push export-control compliance back onto you. That is supply resilience, and it is worth something. It is not data sovereignty, it is not jurisdictional independence, and it reduces your visibility into the system rather than increasing it. Read Fugu as an orchestration hedge and the value is clear. Read it as the marketing word, and you will have bought something other than what you thought.

For EU, UK, regulated, and audit-bound buyers, this is not close: the door is shut today, and the procurement documents that would reopen it do not exist yet. For everyone else, the right move is to trial it on disposable work and measure the orchestration tax yourself before you let it near production. The benchmark leads are checkable against public baselines, but they are vendor-published, not a production audit. Whether they survive contact with your own workload, behind routing you cannot inspect, is the one question only a hands-on test can answer, and that test is still to come.