AI Explained

How to check whether an AI answer is any good

AI tools answer everything with the same calm confidence. A practical guide to checking when an AI answer is right, and when not to bother.

By Sarah Drummond · May 3, 2026

You ask an AI a question. It answers in three confident paragraphs that sound exactly like the kind of thing a knowledgeable person would say. The grammar is clean, the structure is tidy, and there are even a couple of plausible-looking statistics thrown in. Job done. Or is it?

The single biggest problem with using AI tools every day is that they almost never sound unsure. ChatGPT, Claude, Gemini, Perplexity, the lot of them, they all produce output with the same calm authority whether they are right or wrong. The model that just made up a quote will do so in exactly the same tone as the model that quoted you correctly. There is no tell. There is no flicker of hesitation. And that is what makes this tricky.

Once you accept that, the question changes. It stops being “is this AI good” and starts being “how do I check whether this particular answer is any good”. That is a much more useful question, because it has practical answers.

Start with the question itself. The first thing worth asking is whether the question you typed in was specific enough to even allow a checkable answer. If you ask “what is the best CRM” you will get a fluent paragraph back, but there is no way to verify the result because there is no objective answer. If you ask “which CRMs in 2026 charge under twenty pounds per user per month and integrate with Xero” you have a question that produces claims you can actually check. Vague questions invite vague answers, and vague answers cannot be fact-checked because there is no fact to check.

The next step is to identify the load-bearing facts in the answer. Most AI responses are a mix of general framing, which is usually fine, and specific claims, which are where the danger sits. Numbers, dates, names, quotes, statutes, prices, version numbers, study citations, anything that involves the model recalling something specific rather than reasoning about something general. Mark those mentally. Those are the bits that need verifying.

Then go and find the original source. Not another AI summary, the actual source. If the answer says a regulation was passed in 2024, find the regulation. If it quotes someone, find the quote. If it cites a study, find the study. The number of times an AI will produce a fluent paragraph that references a paper which does not exist, or a clause in legislation that was never written, is genuinely surprising the first time you start checking properly. The technical name for this is hallucination, and the fact that we have a friendly word for it sometimes hides how confidently it happens.

The highest-risk categories deserve their own attention. Numbers are the most likely thing to be wrong, especially anything statistical. Models are generally weak at arithmetic and at recalling precise figures. If your answer hinges on a percentage or a sum, treat it as a hypothesis until you confirm it. Dates are similar. Models will often place an event in the wrong year, or describe a current state of affairs as if it were last year, because their training data has a cutoff and they are filling in a gap. Quotes are the third high-risk category. AI is especially bad at remembering quotes verbatim. It will get the gist and the speaker right but invent half the words. If a quote is going to appear in something you are publishing, find it in a source you trust before using it.

Cross-referencing is the workhorse technique. Open one or two sources you already trust, ideally a primary source rather than another summary, and check that the AI’s claims survive contact with them. If the answer is about the law, the primary source is the legislation or the regulator’s own guidance. If the answer is about a company’s pricing, the primary source is their pricing page. If the answer is about a piece of research, the primary source is the paper. If the AI has made an aggregating claim, like “most studies find x”, the cross-reference is harder, and at that point it is worth slowing down and asking whether the answer is genuinely supported or whether the model has produced a confident average of vibes.

A useful technique that takes thirty seconds is to ask the model itself to back up its claim. Paste the specific sentence back in and ask “what is the source for this”, or “show me the quote this paraphrases”. Sometimes the model will produce a clean source. Sometimes it will hedge in a way that quietly tells you the original sentence was made up. Sometimes it will produce a source that does not exist, and you can catch that with a single search. None of this is a guarantee, but it raises the cost of getting away with a hallucination, and it forces you to look at the specific claim rather than the surrounding fluency.

Knowing when not to bother is also important. Not every AI answer needs to be fact-checked to the same standard. If you are asking it to draft an email, summarise a long article you can already see, or brainstorm angles for a piece of writing, you are using it for output that you will read and judge yourself. Mistakes will be obvious. The danger is when you are using it for input, when its answer becomes the basis for a decision, a quote in a piece, a calculation, or a claim you make to someone else. That is the moment to slow down and verify.

The pattern that catches most people out is the gap between fluency and accuracy. A confident, well-structured answer feels true in a way that a hesitant or messy answer does not. The brain reads coherence as a signal of correctness, and AI is exceptionally good at producing coherence. Train yourself to separate the two. The question is never how well written the answer is. The question is whether the specific claims in it are correct.

There is one more habit worth building, which is keeping a private list of the kinds of mistakes a particular model tends to make for the kinds of questions you ask. Maybe you notice that one model consistently misremembers UK regulations. Maybe another tends to be over-cautious about numerical claims and refuses to commit. Maybe a third is reliable for current news but weak on historical context. After a few weeks of using AI seriously, patterns emerge. Once you know where a given tool is unreliable, you can verify those areas more carefully and trust the rest more easily.

The short version of all of this is that AI answers are a draft, not a verdict. They are extremely useful as a starting point and dangerous as an ending point. The skill that separates people who get real value from these tools from people who occasionally make a fool of themselves is not better prompting. It is the habit of treating every confident-sounding answer as a hypothesis, and the practice of checking the parts that matter before using them in something that matters.