Last updated: May 31, 2026
Grok 4.3 is xAI’s current flagship model, released April 17, 2026. It runs a 1 million token context window and costs $1.25 per million input tokens and $2.50 per million output on the API. Grok appears across the consumer app, the xAI API, and Grok inside X. This review is about which version, mode, and meter you are paying for.
Grok’s $300 mode could not build the file. The default mode built it in under a minute.
I gave Grok one instruction: create a downloadable Excel file, a twelve-month SaaS revenue model, real formulas, no static numbers. One line. Then I ran that exact line through all five of Grok’s modes to see which one would hand back a file.
The expensive ones did not. Heavy, the $300-a-month tier that markets itself as the world’s strongest AI, thought for ninety-four seconds and returned a Python script with instructions to run it myself. I asked three times. Three scripts, no file. The mode called Expert behaved the same way across three more attempts: a copy-paste template once, a script the next, each delivered with a confident technical reason. “Tool limitation.” “The sandbox can’t deliver binary files.” “Interface limitation.” Three different excuses, and none of them held up.
They did not hold up because the cheaper modes simply made the file. Auto, the default, returned a finished spreadsheet. Fast returned one in twelve seconds. Grok 4.3, the newest model in the picker, produced the richest version of the batch, with an editable assumptions block. I checked its formulas by hand. I changed the starting MRR from $15,000 to $8,000 and watched all twelve months and the ARR total recalculate. Live formulas, not a static table.
So thinking time was not the variable. Grok 4.3 thought longer than Heavy did and got it right. The file was never impossible. The same product, on the same prompt, built it three separate ways. The modes that refused were not blocked by Grok. They stopped themselves, and the reason they gave did not survive the next three prompts.
One more detail, because it sets the tone for everything below. Every file Grok did build, Grok could not open. The in-app preview spun and never loaded. The only way to confirm the work was good was to download each file and open it somewhere that was not Grok.
This is a review of Grok 4.3. It is also a record of what a fast-moving AI product does when you hand it one real task, and how hard it makes the question underneath every Grok decision: which Grok you are using, which model, which mode, which meter, and whose privacy rules.
Public review depth: Tier B
TL;DR
- The one-line verdict: Grok 4.3 is worth testing if you can pick the right mode, control your tool spend, and verify your own outputs. It is not yet a “pay more, get better work” product.
- The headline finding: on a $300 Heavy account, the Heavy and Expert modes failed to produce a downloadable spreadsheet six times out of six. Auto, Fast, and the Grok 4.3 mode each produced one. The failures came with false technical excuses.
- The hidden cost: the API token price is low, but web search, X search, and code execution each add $5 per 1,000 calls, and the agent decides how many calls to make.
- “Real-time” is a switch you pay for: on the API, Grok has no live data unless you enable search tools. With them off, it answered a current-events question with stale, confident, wrong information.
- “Grok” is not one trust boundary: the consumer app, the API, and Grok inside X each run under different data rules.
- What this is not: not a claim that Heavy is useless, not a claim that Grok cannot make files, and not a regulatory verdict. Details below.
- Observed by FSR: the five-mode file test, the in-app preview failure, and the API token delta with web search off versus on, all on a paid SuperGrok Heavy account with screenshots and the generated files retained.
- Confirmed in xAI documentation: Grok 4.3’s 1 million token context, the $1.25 and $2.50 API token prices, the per-call tool fees, the $0.05 violation fee, and the statement that Grok has no real-time data unless search tools are enabled.
- Inference: the Heavy and Expert failures may reflect mode routing, tool access, interface behavior, or the model misreporting its own limits, rather than the base model being unable to build a file.
- Unverified or reporting-sourced: the April 17 release date comes from consistent independent reporting, not an xAI press release; the mapping between app modes and underlying model IDs is inferred; the single X post cited below is one public sample.
Quick start: which mode for which job
If you only take one practical thing from this review, take this. Grok’s modes do not behave the way their names suggest, so choose by task, not by price.
For producing an actual file, a spreadsheet, a slide deck, a document, use the Grok 4.3 mode in the picker. In this test it produced the cleanest working file of the five modes. Then download it and open it in a real app. Do not trust the in-app preview to confirm the file worked.
For fast drafting and quick questions, Fast is fine and quick. For general chat, Auto is the sensible default. Reach for Heavy or Expert when you want maximum reasoning effort or the sixteen-agent setup on a hard analytical problem, not when you need a clean deliverable handed back.
On the API, keep web search and X search off until a request needs current data, because each call is metered. When you do need current information, turn search on and expect the token count and the bill to jump.
If you are evaluating Grok for a business, do not assume your data is private by default. The contractual “no training” guarantee sits on the Business and Enterprise plans, not the individual ones. More on that below.
At a glance: prices and key facts
All figures verified against xAI’s official documentation and the Grok and x.ai pricing pages on May 31, 2026. Prices change often. Re-check before you commit a budget.
| Item | Detail |
|---|---|
| Current flagship model | Grok 4.3, released April 17, 2026 |
| Context window (Grok 4.3) | 1 million tokens |
| Knowledge cutoff (Grok 4.3) | Not published by xAI. The xAI models page states November 2024 only for Grok 3 and Grok 4 |
| API price (Grok 4.3) | $1.25 / 1M input, $0.20 / 1M cached input, $2.50 / 1M output |
| Tool fees | Web search, X search, code execution: $5 / 1,000 calls each. File attachments: $10 / 1,000. Collections (RAG): $2.50 / 1,000 |
| Usage-guideline violation fee | $0.05 per request for violations caught before generation in the Responses API |
| Free consumer tier | $0. Includes limited real-time web and X search, voice mode, connectors |
| SuperGrok | $30 / month |
| SuperGrok Heavy | $300 / month, or $3,000 / year ($250 / month, billed annually) |
| Business / Enterprise | Business listed at $30 / month; Enterprise is custom (contact sales) |
| Real-time data on API | Off by default. Requires enabling web or X search tools |
| Data training default | Consumer inputs used for training by default with a settings opt-out. API not trained on by default |
| Corporate owner | xAI (X.AI LLC), a SpaceX subsidiary since early 2026, a separate legal entity from X Corp |
The five-mode file test
Here is the test in full, because it is the center of this review.
The prompt, run verbatim in every mode on a SuperGrok Heavy account on May 30, 2026:
Create a downloadable .xlsx file: a 12-month SaaS revenue model with columns for MRR, churn %, net new MRR, and a formula-driven ARR total. Use real Excel formulas, not static numbers.
The results:
| Mode | Attempts | Result | What it returned |
|---|---|---|---|
| Heavy ($300 tier) | 3 | 0 files | Python scripts and instructions to run them myself |
| Expert | 3 | 0 files | A template once, scripts otherwise, each with a technical excuse |
| Auto (default) | 1 | 1 file | A formatted spreadsheet (the churn % formatting was inconsistent) |
| Fast | 1 | 1 file | A raw, unformatted spreadsheet, full of long decimals |
| Grok 4.3 (beta) | 1 | 1 file | The richest version, with an assumptions block and a “how to use” note |
The two heavier modes failed every attempt. The three lighter modes each produced a working file. I want to be precise about the strength of each half of that result, because it matters for what you can safely conclude.
The failure side is solid. Six attempts across Heavy and Expert, zero files, every time. That pattern repeated, and it is the part of this finding I would defend.
The success side is lighter. I ran one attempt per successful mode, so I am not claiming Auto and Fast succeed every time. They do not. One public X post on May 28, 2026, with screenshots, reported the opposite split: Auto and Heavy failed, and only the Grok 4.3 beta produced the file. That is one post, a visible sample, not a survey. Read the two tests together and the real lesson is not “Heavy bad, Auto good.” It is that the only mode that produced a file in both was Grok 4.3, and the rest were a coin toss.
That is the seam. The mode names imply a ladder, with Heavy at the top, but the ladder does not map to whether you get your file. A buyer who upgrades to the heaviest tier expecting the most reliable output can land on the modes least likely to deliver one.
On the formulas: I did verify that the Grok 4.3 file was real work, not a screenshot of a spreadsheet. Changing the starting MRR from $15,000 to $8,000 recalculated every month and the ARR total. That is genuine formula output. It is also one file on one prompt, so it confirms the model can build a live model, not that it always builds a correct one.
And the excuses deserve their own line. When Heavy and Expert failed, they did not just fail. They explained the failure with statements like “the sandbox can’t deliver binary files” and “interface limitation.” Those statements were not true for this product on this day, because three other modes of the same product delivered the file. xAI does not publish which tools each mode can reach, so I cannot tell you whether this is a real per-mode restriction, a routing quirk, or the model inventing a plausible reason. What I can tell you is that, in this test, Grok gave a confident technical explanation that its own behavior contradicted minutes later.
Grok is not the only system that misdescribes itself; we have tested an agent that called its own cost estimates “hallucinations.”
The three working modes did not only differ in polish. Each invented its own assumptions, and the bottom line moved with them. Auto started from $10,000 in MRR and projected about $281,000 in year-one ARR. Fast started from the same $10,000 with flat net-new growth and landed near $285,000. The Grok 4.3 file started from $15,000, ramped net-new MRR from $1,500 to $10,500 a month, and projected $754,965. Same one-line prompt, three answers, a spread of more than two and a half times. None of them is wrong, because the prompt never fixed the inputs. That is the quieter half of the finding: when you do not pin the assumptions, the mode you happen to be in picks them for you.



What the version names hide
Now the part that explains a lot of the mess above.
“Grok” is not one model, and the version string you see depends on where you look. On the x.ai pricing page, the model row is labeled simply “Grok 4.” Open the SuperGrok upgrade screen inside the app and it sells you on “Grok 4.20.” Open the model picker in the same app and you can select “Grok 4.3 beta.” Open the developer docs and the API serves a model called grok-4.3, alongside an older grok-4.20 family.
The API pricing page lists these chat models, and the spread is worth seeing in one place:
| API model | Context | Input / Output per 1M |
|---|---|---|
| grok-4.3 | 1M | $1.25 / $2.50 |
| grok-4.20-multi-agent-0309 | 1M | $1.25 / $2.50 |
| grok-4.20-0309-reasoning | 1M | $1.25 / $2.50 |
| grok-4.20-0309-non-reasoning | 1M | $1.25 / $2.50 |
| grok-build-0.1 (coding) | 256K | $1.00 / $2.00 |
Two things fall out of this. First, every current model tops out at 1 million tokens of context. If you have read that Grok offers a 2 million token window, that figure belongs to an older Fast model that no longer appears on the live pricing table. For the current flagship, the honest number is 1 million.
Second, look at that grok-4.20-multi-agent entry. The app’s Heavy tier sells “Grok 4.20” and advertises an Expert mode where “sixteen agents work as a team.” A multi-agent 4.20 model and a sixteen-agent Expert mode line up closely enough that the most reasonable read is this: the $300 Heavy experience is built around the 4.20 multi-agent model, while the newest model, 4.3, sits in a separate “beta” mode you have to switch into yourself. I am labeling that as inference, because xAI does not publish the mapping from app mode to underlying model. But if it is right, it explains the file test cleanly. You can pay for the heaviest tier and still be one manual switch away from the newest model, and the heaviest tier’s default modes were the ones that failed the file.

There is a quieter version of the same problem on the API. xAI uses model aliases, where a bare name like grok-4 points to the latest stable version and silently migrates as new versions ship. Pin grok-4 in your code thinking you have locked your model, and the bare alias can move underneath you unless you pin a dated release. For a team that needs reproducible output, that is a real auditing seam, not a convenience.
None of this is hidden in a sinister way. It is just not surfaced. The buyer sees four different version strings across four screens and has no obvious way to answer the simplest question: which model am I talking to right now.
The file Grok could not open
A short section for a small finding that punches above its size.
All three files that Grok did generate, in Auto, Fast, and the 4.3 mode, failed to open in Grok’s own in-app preview. The preview loaded a spinner and stayed there. Each file opened correctly once I downloaded it and used a normal spreadsheet app.
For a power user, that is a minor annoyance. For a normal user, it is the difference between “Grok made my file” and “Grok failed.” If the preview never resolves, most people will reasonably conclude the task did not work, close the tab, and move on, never knowing a working file was sitting one download away. The product succeeded at the hard part and then hid the result behind a broken viewer.
The meter: tokens, tools, and refusals
Grok’s sticker price is one of its strongest selling points, and it is real. The problem is that for any agentic task, the sticker price is not the bill.
xAI’s own pricing documentation is clear that a request using server-side tools is charged on two components: token usage and tool invocations, and that “the agent autonomously decides how many tools to call,” so the cost scales with how complex the query turns out to be. In plain terms, you set the prompt, but Grok sets the number of billable tool calls.
Those tool calls are not rounding errors. Web search, X search, and code execution each cost $5 per 1,000 calls. File attachment search costs $10 per 1,000. Collections search for retrieval costs $2.50 per 1,000. A single research-style query that fans out into several searches, a code run, and a document lookup is a stack of these fees on top of the tokens. To be exact about scale: the documented tool cost is per call, not per source returned, and a few sources that claim a separate per-source charge are not supported by xAI’s pricing page.
I saw the token side of this directly. I ran the same current-events question through the API playground twice, once with web search off and once on. With search off, the request used 1,056 tokens, returned no sources, and answered in 8.6 seconds. With search on, the same question used 30,948 tokens, returned seven sources, and took 13.8 seconds. That is roughly twenty-nine times the tokens for one query. This is a single test, so I am not claiming every search query costs twenty-nine times more. I am showing why a token-only cost model will mislead you the moment you turn search on, which for Grok’s signature use case is exactly when you would.


Then there is the line most reviews skip. xAI’s pricing page states that when a request is judged to violate its usage guidelines, it still charges for the generation, and that for violations caught before generation in the Responses API it charges a $0.05 usage-guideline violation fee per request. Read the wording carefully, because it is narrower than “you pay for refusals.” It applies to requests the system deems guideline-violating, with normal generation cost if the request is generated and a flat $0.05 if it is blocked first. FSR did not try to trigger this fee, because doing so would mean intentionally sending a policy-violating request. It is here as a cost-architecture fact you should know exists, not as something we tested.
Put the meter together and Grok’s developer cost is not “$1.25 in, $2.50 out.” It is tokens, plus reasoning tokens, plus per-call tool fees, plus storage and download charges for files and collections, plus the occasional violation fee, with the call volume set by the agent rather than by you. The cheap headline number is real, but it will not forecast an agentic workload on its own.
We saw the same shape in another AI agent, where a $99 sticker hid a real bill near $827, and the agent itself did not know.
Here is the full meter in one place. The token rates are the model’s. The variable line is the tool and storage fees the agent triggers on your behalf.
| Cost component | Rate |
|---|---|
| Input tokens (Grok 4.3) | $1.25 / 1M |
| Cached input tokens | $0.20 / 1M |
| Output tokens | $2.50 / 1M |
| Web search, X search, code execution | $5 / 1,000 calls each |
| File attachment search | $10 / 1,000 calls |
| Collections search (RAG) | $2.50 / 1,000 calls |
| File storage / collection storage | $0.025 / $0.10 per GiB per day |
| File or collection download | $0.20 / GiB |
| Usage-guideline violation (pre-generation, Responses API) | $0.05 / request |
Which Grok, whose data
The single most useful thing to understand before trusting Grok with anything sensitive is that “Grok” is a brand stretched across surfaces with different rules.
On the consumer side, xAI’s policy states that your Grok conversations and interactions may be used to train its models by default, with a control in settings to turn that off, after which new conversations are not used for training. A private chat mode is excluded from training. So consumer Grok is opt-out, not opt-in, and the opt-out is a toggle you have to find and flip. On a Japan-based account, FSR saw that training control present, but did not verify what its default state is across regions and prior settings, so treat the default as unconfirmed rather than assuming it is on or off for you.
On the API, the posture flips. xAI states it does not train on API inputs or outputs without explicit permission, and it stores API requests and responses for thirty days for audit and abuse monitoring unless an enterprise customer enables zero-data-retention. The “explicit permission” path is worth naming, because xAI runs a data-sharing program: opt in to letting xAI train on your API traffic, and you receive a monthly block of API credits in return. That is a clean trade to understand. The “free” credits are the price of your data, and for proprietary or client work you would leave that switch off.
A contractual guarantee of no training, and custom data retention, are listed as features of the Business and Enterprise plans, not the individual consumer tiers. That is the cleaner signal for a company than any toggle: if you need a written no-training commitment, the pricing page points you to the business plans.
Then there is the boundary that catches people. Grok used inside X is governed by X’s privacy policy and terms, not xAI’s, and xAI states it is a separate legal entity from X Corp. The same assistant, the same name, sits under different data rules depending on whether you open it at grok.com, through the API, or inside the X app. For Google Workspace connections made through OAuth, xAI states that the connected Workspace content is excluded from model training. That last point is xAI’s own statement rather than something FSR could confirm in the binding legal text, so read it as an official claim.
The same name, four different data postures:
| Where you use Grok | Whose rules apply | Training on your inputs |
|---|---|---|
| Grok app / grok.com (consumer) | xAI privacy policy | Used by default, with a settings toggle to opt out |
| xAI API | xAI API terms | Not used by default; 30-day audit retention; zero-data-retention for enterprise |
| Grok inside X | X’s privacy policy and terms | Governed by X, a separate legal entity from xAI |
| Business / Enterprise plans | xAI DPA and SCCs | Contractual no-training and custom retention available |
For European or enterprise buyers, two more facts matter. xAI publishes a data processing addendum with standard contractual clauses, processor terms, zero-data-retention, and a business associate agreement on request, and it claims SOC 2 Type 2 certification in its API documentation. It also offers an EU region, eu-west-1, for data residency, but its own documentation adds a sharp caveat: if a request cannot be handled in eu-west-1 it will fail, and if you need data to stay within a specific region at rest you have to contact sales, with additional costs possible. So the EU residency exists, but it is conditional and partly gated behind a sales conversation rather than a checkbox.
On the corporate structure: xAI became a SpaceX subsidiary in early 2026 through an all-stock deal, while remaining legally separate from X Corp. Whether any data flows between xAI, X Corp, and SpaceX is not addressed in the public documentation, so a careful procurement team would ask rather than assume.
One last thing belongs here, framed as risk, not as a verdict. Several regulators are looking at Grok, and it is easy to read the headlines as “Grok was fined.” It was not. In December 2025 the European Commission fined X 120 million euros, its first penalty under the Digital Services Act, for transparency breaches involving the blue checkmark, the advertising repository, and researcher access to data. That fine was about the X platform, not Grok, and X is contesting it. Separately, regulators including the Irish Data Protection Commission have opened inquiries into image generation by Grok on X, and French and UK authorities have flagged related content. The public record here is mostly about the X platform, Grok on X, or related entities, rather than a published finding against the xAI API as a product. FSR is not drawing a legal conclusion here. The point for a buyer is narrower: the regulatory weather around Grok is real, it is mostly about the X side, and you should not let a “120 million euro fine” headline migrate onto the model in your own risk notes.
Who should and who should not use Grok
Use Grok if you want live X and web context in your workflow, you can verify your own outputs, you know when to switch to the 4.3 mode for a file and when to keep search off to save money, and you can tolerate a product whose boundaries move quickly. For an operator who treats Grok as a sharp tool and checks its work, the low token price and the X-native search are a real edge.
Be careful with Grok if you assumed a higher price buys a better workflow, you need files handed back without checking them, you need predictable API costs without tool-call surprises, or you need a written data-handling posture without reading the terms for your specific surface. None of those are dealbreakers, but each one is a place this product will surprise you if you do not plan for it.
Wait, or look elsewhere, if you need deterministic, audit-clean output today, a mature one-click cloud procurement path, or a guaranteed-private default without configuration. Those buyers will spend less effort clearing a more conventional enterprise AI option, depending on the stack they already run.
FAQ
Is Grok Heavy better than Grok 4.3 for everyday work? Not for file output. On a SuperGrok Heavy account, FSR’s Heavy and Expert modes failed to return a downloadable spreadsheet across six attempts, while the Grok 4.3 mode produced a working file with live formulas. Heavy buys higher rate limits and a sixteen-agent Expert mode, not more reliable file delivery.
Can Grok create Excel files? Yes, in some modes. In FSR’s test, the Auto, Fast, and Grok 4.3 modes each returned a working .xlsx file with real formulas, while Heavy and Expert returned Python scripts instead. Grok’s in-app preview failed to open any of the generated files, so download them and open them in a spreadsheet app.
Why did Grok say making the file was impossible? It was not impossible. Grok’s Heavy and Expert modes gave reasons like “tool limitation” and “the sandbox cannot deliver binary files,” yet other modes of the same product built the file. FSR treats this as a false self-report in this test, not a confirmed product limit. xAI does not document per-mode tool access.
Does Grok search the web by default? On the API, no. xAI’s documentation states Grok has no access to real-time events unless you enable the web search or X search tools, which are billed at $5 per 1,000 calls each. With search off, Grok answers from training data and can return outdated information stated with confidence.
How much does the Grok API really cost? Grok 4.3 lists $1.25 per million input tokens and $2.50 per million output. Tool calls are extra: web, X, and code each cost $5 per 1,000 calls, file attachments $10, collections $2.50. Because the agent decides how many calls to make, a search-heavy query can cost far more than the token price suggests.
Is Grok on X covered by the same privacy policy as the Grok app? No. xAI’s policy states that Grok used inside X is governed by X’s privacy policy and terms, not xAI’s, and that xAI is a separate legal entity from X Corp. The same chatbot can sit under different data rules depending on where you open it.
Should a business put confidential files into Grok? Treat it as an open question. Consumer Grok uses your inputs for training by default, with a settings toggle to opt out. A contractual no-training guarantee and custom data retention are listed only on Business and Enterprise plans. The API is not trained on by default and offers zero-data-retention for enterprise.
Was X fined for Grok? No. The European Commission fined X 120 million euros in December 2025 for transparency breaches involving its blue checkmark, ad repository, and researcher data access, not for Grok. Separate inquiries touch Grok image generation on X, including an Irish data-protection case, but these target the X platform, not the xAI API.
What would change this review’s verdict? If xAI updates the Heavy and Expert modes so they return downloadable files on the same prompt, the opening finding should be updated. The cost and data-boundary findings would still stand, because they come from xAI’s own pricing and policy documentation, not from a single test.
Methodology and sources
Testing was performed on a paid SuperGrok Heavy account and through the xAI API playground on May 30, 2026. FSR ran one file-generation prompt across all five Grok modes, nine attempts in total (three each on Heavy and Expert, one each on Auto, Fast, and the Grok 4.3 mode), and verified one generated spreadsheet outside Grok by editing an input and confirming the formulas recalculated. FSR compared API behavior on an identical current-events query with web search disabled and enabled, recording token counts, latency, and source counts. Pricing, model, and policy facts were checked against xAI’s official documentation and the Grok and x.ai pricing pages on May 31, 2026. The April 17, 2026 release date comes from consistent independent reporting; xAI did not publish a press release for the launch.

What FSR did not test: the usage-guideline violation fee, which would require sending a policy-violating request; enterprise zero-data-retention and business-associate terms; long-term reliability across days or weeks; the exact dollar cost of the web-search query, which depends on console billing not yet reflected at the time of writing; behavior across every region and account configuration; and connector behavior with real business accounts. Claims about competitors’ prices in this review are not independently verified by FSR and should be confirmed on each vendor’s own pricing page.
Highest-weight sources here are FSR’s own hands-on observations and xAI’s official model, pricing, and policy pages. Regulatory facts are drawn from the European Commission’s own announcement and reputable reporting. Where a figure could not be confirmed in a primary source, it is labeled as such in the text.
FSR verdict
Grok 4.3 is strong and cheap at the token layer, and for a power user who checks its work it is well worth testing. The catch is that you are not really buying a model. You are buying a product that splits into several models, several modes, several meters, and several privacy boundaries, and then leaves it to you to learn which is which.
The clearest evidence is the test that opens this review. The most expensive consumer tier failed the exact task its newest model was built for, and explained the failure with reasons that were not true. That is not a story about a weak model. It is a story about a product that has not made its own seams visible to the person paying for it.
Buy it as a sharp instrument, not as a hands-off workflow. Switch to the 4.3 mode for files, keep search off until you need it, verify every output outside the app, and read the data terms for the exact surface you are using. Do that, and Grok earns its place. Skip any of it, and the heaviest, most expensive choice is the one most likely to hand you a script instead of your file.
Author note: this review is based on first-hand testing and primary-source verification. FSR has no affiliate relationship with xAI and earns nothing from your choice here. Prices, models, and policies in this space change quickly. Figures were current on May 31, 2026 and should be re-checked before any purchase or build.
Future Stack Reviews can prepare AI/SaaS comparison briefs covering pricing, data terms, workflow risk, and buyer-fit tradeoffs.
Request a review brief