AI Explained

What is reasoning in AI, and is the model really thinking?

AI companies say their newest models can reason. What that really means, when it actually helps, and whether the machine is truly thinking.

Something shifted in how AI companies talked about their models across 2024 and 2025. The pitch used to be that you typed a question and the system wrote you an answer. The pitch now is that the system thinks first. OpenAI calls them reasoning models. Anthropic calls it extended thinking. Google calls it deep thinking. The marketing implies that the machine has stopped guessing and started working things through. That is a serious claim, and it is worth taking apart.

The first thing to know is that the underlying model has not really changed. A large language model is still a vast statistical engine that predicts the next token, then the token after that, and so on, one piece at a time. A token is a chunk of text, usually a word or part of a word. Every output the model has ever produced, whether it was a haiku or a tax explanation or a piece of working Python, came out of the same machinery: take everything written so far, run it through billions of weights, and pick what most likely comes next. That has not been replaced. What has been added is more time and more space for the model to think out loud before it commits to a final answer.

The simplest way to describe a reasoning model is this. When you ask a normal chat model a question, it begins typing the answer immediately. When you ask a reasoning model the same question, it first generates a long internal monologue. Sometimes you can read this monologue, sometimes the company hides it. Either way, the model spends thousands of tokens working through possibilities, checking its arithmetic, second-guessing assumptions, then pulls together a final response. The monologue is not shown to the user as the answer. It is scratch paper. The final response is the polished version.

This is not a new idea. Researchers had shown for years that if you asked a model to think step by step before answering, accuracy on hard problems went up. The trick was called chain of thought prompting. What the labs did with reasoning models was bake that behaviour into the system itself, then train the model to be better at it. The model is rewarded during training for producing internal steps that lead to correct answers, not just for matching the surface of correct answers. Over time it gets better at allocating thinking time, spotting its own mistakes, and trying alternative approaches when the first one stalls. Combine that with brute computation, give the model thirty seconds or two minutes to grind on a problem instead of one second, and results on maths, code, physics and logic genuinely improve. That improvement is not marketing. It shows up on benchmarks that researchers trust.

Whether any of that counts as thinking is the harder question. The people who build these systems are split. The case for calling it reasoning is straightforward. The model sets up a problem, identifies sub-steps, checks intermediate results, catches errors, revises and produces an answer that is correct more often than a one-shot response would be. By the most operational definition of reasoning, what gets things right under uncertainty, the model is reasoning. The case against is also straightforward. The internal monologue is still next-token prediction. Nothing about the architecture forces the steps to be true, only that they look like the kind of steps a human would write when reasoning. A reasoning model can produce a confident, well-formed chain of thought that contains a lie somewhere in the middle and still arrive at the wrong answer with full conviction. It can also produce a chain that is logically broken but reaches the right answer by accident. There is no internal logic engine. There is only a very good imitation of one.

A useful way to picture it is this. A spreadsheet either calculates a sum or it does not. If a cell adds up wrong, the spreadsheet is broken. A reasoning model, by contrast, is a creative writer of plausible reasoning. Most of the time the writing is faithful to the underlying maths or logic. Sometimes it is not. The skill the model has learned is producing reasoning that tends to be correct, not reasoning that is guaranteed to be correct. That is a meaningful shift from the chat models of two years ago, but it is not the same thing as a calculator.

For a normal user this distinction matters less than the marketing makes it sound. What you actually want to know is when to use a reasoning model and when not to. The honest answer is that reasoning modes help on tasks where the work is heavy and the answer is verifiable. Maths problems with a definite solution. Code that either runs or does not. Logic puzzles. Multi-step research questions where the chain of inference is long. Anything where a single mistake in the middle quietly destroys the final answer. On those tasks, the extra time pays off, and the response quality jumps noticeably. On lighter tasks, drafting an email, summarising a meeting note, polishing a paragraph, the reasoning mode mostly adds latency and cost without adding value. A standard chat response is fine and is what the model is best at.

Two cautions follow from all of this. The first is that reasoning models still hallucinate. They are not immune. A long chain of thought makes them more likely to catch their own errors, but it also gives them more rope to invent confident-sounding nonsense. Always check the output on anything that matters. The second is to not be too impressed by the visible thinking process. Some products show you the internal monologue while the model is working. It looks profound. It is mostly a literary effect. The model is not actually pausing to reflect in the human sense. It is generating tokens that resemble reflection.

The most accurate way to think about this generation of models is that they have learned to produce the artefacts of reasoning, and that producing those artefacts genuinely makes them better at hard problems. Whether that is real thinking is an interesting philosophical question. For day-to-day use, the better question is whether the answer is correct, and you should still verify that yourself.

Featured image: Tara Winstead on Pexels.