Abstract visualization of neural network architecture representing Grok 4's advanced capabilities

Grok 4 Released: xAI's Most Powerful Model Yet Challenges GPT-5 and Claude Opus

Bharath 4 min read

Key Takeaways

  • Grok 4 achieves state-of-the-art results on MMLU, HumanEval, and the new ARC-AGI-2 benchmark
  • Real-time web access and X platform integration provide unique data advantages
  • Available immediately to X Premium+ subscribers with API access rolling out next week
  • Pricing undercuts GPT-5 by approximately 30% on comparable token volumes

What Just Happened

xAI officially launched Grok 4 today, marking the company’s most significant model release since its founding. The new model represents a generational leap over Grok 3, with particular improvements in mathematical reasoning, code generation, and multimodal understanding.

In a live demonstration streamed on X, the xAI team showcased Grok 4 handling complex multi-step reasoning tasks that would have stumped previous-generation models. The model correctly solved graduate-level physics problems, generated production-ready code for full-stack applications, and demonstrated nuanced understanding of images, charts, and diagrams.

Benchmark Performance

The numbers are impressive. Grok 4 scores 92.4% on MMLU-Pro (up from Grok 3’s 84.1%), 96.8% on HumanEval for code generation, and achieves a new high of 78.3% on the notoriously difficult ARC-AGI-2 benchmark designed to test genuine reasoning ability.

Perhaps most notably, Grok 4 performs competitively with GPT-5 on most benchmarks while significantly outperforming it on tasks requiring real-time information synthesis — a direct benefit of its deep integration with X’s data firehose.

Architecture and Training

While xAI hasn’t published a full technical paper yet, the team revealed that Grok 4 uses a mixture-of-experts architecture with approximately 1.2 trillion total parameters. The model was trained on xAI’s Memphis supercomputer cluster, which now houses over 200,000 NVIDIA H100 GPUs.

The training data includes a curated mix of web data, academic papers, code repositories, and — controversially — public posts from the X platform. xAI maintains that all training data usage complies with X’s terms of service and applicable regulations.

Real-Time Web Access

One of Grok 4’s standout features is its native real-time web browsing capability. Unlike competitors that rely on retrieval-augmented generation with periodic index updates, Grok 4 can access and synthesize live web content during inference. This makes it particularly strong for questions about current events, market data, and rapidly evolving topics.

Pricing and Availability

Grok 4 is available immediately through the X app for Premium+ subscribers at no additional cost. API access launches next week through xAI’s developer platform at $8 per million input tokens and $24 per million output tokens — positioning it roughly 30% below GPT-5’s pricing tier.

What This Means for the AI Landscape

The frontier model race continues to intensify. With Grok 4, xAI has firmly established itself as a top-three player alongside OpenAI and Anthropic. The competitive pressure is driving rapid innovation, but it’s also raising questions about the sustainability of the massive capital expenditures required to stay at the frontier.

For developers and businesses, the good news is clear: more competition means better models at lower prices. The challenge remains choosing the right model for specific use cases in an increasingly crowded field.