AI Explained

What is a token and why do AI companies charge by them?

Tokens are how AI models read and write. Understanding them explains why pricing works the way it does and how to use these tools more cost-effectively.

By Sarah Drummond · April 17, 2026

When you type a question into Claude or ChatGPT, the reply arrives in seconds and feels almost conversational. What you probably do not think about is what is happening underneath. The model is not reading your words the way you wrote them. It is breaking them down into something called AI tokens, and everything about how these tools are priced flows from that fact.

Reviewed by Priya Nair, Editorial Director · May 16, 2026, 02:30

AI tokens are the basic units that language models use to read and write text. Not words, and not letters either. A token is a chunk of text, typically a few characters up to a short word.

The word “hello” is one token. The word “unbelievable” might be broken into two or three. Punctuation often gets its own token. Numbers can behave differently again.

The exact breakdown depends on the specific model. But a rough rule of thumb is that 100 AI tokens corresponds to around 75 English words. Once you know that, a lot about how these tools are built and priced starts to make more sense.

In this article

Why everything in AI is measured in tokens

Every time you send a message to a language model, it processes the entire conversation. Your question, the model’s previous answers, any background instructions running in the system: all of it gets processed as AI tokens. The model has no persistent memory between sessions. It re-reads the full context every single time.

All of that processing requires compute power. Compute power has a real cost. This is why AI companies price their APIs by the token.

Anthropic, OpenAI, Google, and others publish rate cards showing what they charge per million AI tokens of input and per million tokens of output. The rates vary by model. A lighter model costs far less per token than a frontier one.

How model pricing works

A lighter, faster model like Claude Haiku or GPT-4o Mini costs a fraction of what a more powerful model charges. A frontier model handling complex reasoning costs significantly more per token. The logic is straightforward: more capable models require more computation to run, so each AI token costs more to process.

Understanding the relationship between model capability and cost is one of the most useful things a developer or business owner can know. You can explore what each tier actually delivers in this guide to comparing AI models. It helps clarify what you are paying for before you commit to a pricing tier.

Input tokens versus output tokens

The split between input and output AI tokens matters, because they are almost always priced differently. Input tokens are everything you send to the model: your message, the conversation history, any documents you have pasted in, any background instructions. Output tokens are the words the model generates in response.

Generating text is more computationally intensive than reading it. Output is typically priced at a higher rate than input, often two to three times more. When a model writes a long answer, you are paying more per token for that answer than for the context you sent alongside your question.

What token costs look like at scale

For casual users on a flat subscription plan, this is entirely invisible. You pay your monthly fee and the token accounting happens behind the scenes. You might notice a usage limit during a long session. But you never see a bill that itemises individual AI tokens.

That changes the moment you access these tools via an API. Developers and businesses building their own applications pay per token. At any meaningful volume, it adds up quickly.

Consider a small business using AI to handle customer enquiries. If you process 200 messages a day, each with a 300-word query and a 250-word reply, you are looking at roughly 1,400 tokens per exchange. Add 500 words of background context per session and the number climbs further.

Across 200 daily exchanges, that is around 280,000 tokens a day. Over a month, that comes to nearly nine million tokens. On a mid-range model, that might cost anywhere from fifteen to fifty pounds. On a premium model, the bill could be considerably higher.

This is why choosing the right model for a task is an economic decision, not just a performance one. A lighter, cheaper model handles routine queries just as well at a fraction of the cost.

Knowing what can go wrong when AI agents act autonomously is equally important when building at volume. Errors compound across thousands of interactions. The cost of getting it wrong is measured in tokens too. Getting the model selection right from the start saves both money and mistakes.

Context length and why it matters for cost

Modern models can hold enormous amounts of text in a single session, sometimes hundreds of thousands of AI tokens. This is genuinely useful when you need to work through long documents or maintain complex conversations. But feeding an entire 50,000-word report into a model costs you 50,000 input tokens. That is true even if the answer only required reading two paragraphs.

Being thoughtful about what context you actually send is one of the most practical ways to keep costs manageable. Paste in what is relevant. Leave out what is not.

That discipline alone can cut your token spend significantly. The quality of the output often improves too. The model is not wading through irrelevant material. A tighter context is usually a better one.

Some providers have also introduced prompt caching. This allows frequently repeated context to be stored temporarily. A long system prompt used across thousands of API calls does not need re-processing every time.

For businesses running the same background instructions across many interactions, this can reduce AI token costs substantially. It is worth checking whether the provider you are using supports it. Anthropic, OpenAI, and Google all offer some form of caching. The specifics vary by plan.

What this means for ordinary users

For most people, the lesson from all of this is fairly practical. Understanding that AI pricing is based on text processed, not answers delivered, explains why free tiers run out. It explains why longer conversations cost more to sustain. And it explains why generating a detailed response costs more than reading a short one.

It also explains why hitting a usage cap mid-session is not a technical glitch but an economic reality. A long conversation has been accumulating AI tokens the entire time. By the time you hit the limit, the model has processed far more than just your most recent message.

You do not need to count tokens when you sit down to use one of these tools. But knowing the basic mechanics gives you a clearer picture of what is happening and why the pricing works the way it does.

Understanding how AI systems are built and constrained is useful background for anyone using them seriously. Every question you type and every answer you receive is measured in AI tokens. And every one of them has a cost.

The official OpenAI tokenizer tool is useful for seeing exactly how any piece of text gets broken down. It makes the abstract concrete. Paste in a paragraph and watch it split into tokens. The pattern becomes clear quickly.