
Elon Musk announced Grok 3 through his business, xAI, after putting in a bid to acquire OpenAI last week, claiming it was “the most powerful AI in the world right now.” He might be correct if the live demo’s benchmarks hold up.
Grok 3 joins the expanding field of reasoning models, going up against DeepSeek‘s R1 and OpenAI’s o1. Reasoning models demonstrate their thought process by dissecting issues step-by-step before reaching a conclusion, in contrast to general-purpose models like ChatGPT that produce answers automatically.

But it appears that xAI is presenting Grok 3 as a generalist AI in addition to a reasoning model. It works similarly to GPT-4o or Claude 3.5 Sonnet when Think mode is off (more on this later). It is quick, conversational, and designed for general jobs. However, when Think mode is enabled, it becomes a reasoning model.
Don’t worry if you were unable to see Grok 3’s one-hour live demo; I’ll cut through the clutter and explain the key points for you.
Grok 3: What Is It?

The newest AI model from xAI, Grok 3, is positioned as a direct rival to OpenAI’s o1 and DeepSeek’s R1. According to the xAI team, Grok 3 is 10 to 15 times more powerful than Grok 2, and based on the benchmarks shown in the demo, it may even be able to compete with the finest models available.
What distinguishes the various models of reasoning?

Most AI models operate as follows: you ask a question, they provide an answer, and that’s it. If you’ve used ChatGPT, Claude, or Gemini, you’re probably familiar with this process.
Grok 3 and other reasoning models adopt a different strategy. They break down difficulties step by step, demonstrate their intermediate ideas, and even polish their output before offering a final response, as opposed to just spitting out an answer right away. Because of this, they are very effective at arithmetic, coding, and solving practical problems.
Mini Grok 3

Not all tasks necessitate Grok 3’s full-scale logic. Grok 3 small retains Grok 3’s reasoning capabilities while being streamlined for speed and less computation.
Developers that wish to maximize their token use expenditures while utilizing the API may find Grok 3 mini particularly helpful.
A quicker response in the chat interface could be achieved by switching to Grok 3 Mini. There won’t be many queries it can’t answer based on the benchmarks.
Think Mode in Grok 3

Grok 3’s multi-step reasoning process can be activated by selecting the optional Think mode setting. It breaks problems down into smaller steps, considers various answers, and then refines its response before producing a final conclusion, as opposed to leaping directly to an answer.
Complex problem-solving, mathematical proofs, coding challenges, and logic-based tasks are especially well-suited for this mode. It is perfect for circumstances when the caliber of reasoning is more important than speed because it emulates human-like ordered thinking.

As far as I can tell, Grok 3 is being positioned by xAI as a generalist and thinking model. It functions more like GPT-4o or Claude 3.5 Sonnet when Think mode is off; it is quick, conversational, and designed for everyday use. However, it enters reasoning mode when Think mode is engaged, dissecting intricate issues one step at a time.
The benchmarks make this mixed approach even more evident. xAI tested Grok 3 against generalist models like GPT-4o, DeepSeek-V3, and Claude 3.5 Sonnet in addition to reasoning models like OpenAI’s O1 and DeepSeek R1. This implies that they would prefer it to compete in both categories as opposed to just one.
Big Brain Mode in Grok 3

Grok 3’s high-performance setting, Big Brain mode, allocates additional processing power to tackle challenging tasks.
Grok 3 takes longer to process queries when enabled, but it provides more thorough answers, deeper insights, and higher accuracy. When normal inference may not be sufficient, this mode is very helpful for scientific study, multi-layered AI activities, and extremely complex problem-solving scenarios.
DeepSearch in Grok 3

xAI’s integrated research tool, DeepSearch, enables Grok 3 to search the web, validate sources, and compile up-to-date material before producing a response.
DeepSearch is perfect for news, market trends, technical research, and fact-checking since it gathers new data, unlike typical AI models that rely on pre-trained data. With this mode, Grok 3 is positioned to compete with Deep Research from OpenAI and Gemini.
How Did Grok 3 Get Made?

Grok 3 is based on a significant increase in processing power, novel training methods, and infrastructure improvements. To aid in the creation of Grok 3, xAI has built one of the biggest AI training clusters in the world, in contrast to its predecessors, which were taught on comparatively little hardware.
xAI’s bespoke supercomputer, Colossus

The availability of computing power is one of the main obstacles to training large-scale AI models. xAI created their own supercomputer cluster, Colossus, to circumvent this (the warehouse is shown in the above image).
In just 122 days, the first phase deployed 100,000 H100 GPUs, making it one of the world’s largest clusters for AI training.
In 92 days, xAI doubled the computing capability in the second phase. Because of the ongoing training made possible by this architecture, Grok 3 continues to advance in real time as more users engage with it.
Between Grok 0 and Grok 3

Although Grok 1 had personality, it was far from as good as Claude 3.5 Sonnet or GPT-4o when it was published in November 2023. Only a few months later, Grok 2 came out, and while it was significantly better, it was still not as good as the best models.
But Grok 3 represents a considerably larger leap. The team asserts that both model enhancements and a significant boost in training computation have made Grok 3 10–15 times more potent than Grok 2.
Benchmarks for Grok 3

xAI assertions One of the most potent AI models to date is Grok 3, and based on benchmarks from its live demo, it may even be on par with the best. To understand how it compares to GPT-4o, Claude 3.5 Sonnet, Gemini-2 Pro, and DeepSeek-V3, as well as other reasoning models like O1 and DeepSeek-R1, let’s examine the findings in math, science, and coding.
How Can I Get Grok 3?
Grok 3 is being rolled out gradually by xAI, with broader availability anticipated in the upcoming months. Grok 3 will be available to us through the API and a chat-based interface.
API for Grok 3

Grok 3 has not yet been made available via the API as of the writing of this article, but it will probably be in the near future. For the most recent information, visit the models page.
In conclusion

Although Grok 3 is undoubtedly xAI’s most ambitious release to date, I’m interested to see how it performs when not using its own demo standards. As of right now, it appears to be a strong reasoning model that can compete with DeepSeek and OpenAI in multi-step problem solving.
On paper, the hybrid approach makes sense because it can transition between quick, conversational responses and more in-depth thinking using Think mode. However, I would like to know how well it truly generalizes outside of science, math, and coding, particularly in skills like writing, summarizing, and conducting real-world research.