AI Explained

What is fine tuning, and when does it actually help?

Fine tuning adjusts a model's behaviour from the inside. Here is what it does, how it differs from prompting and RAG, and when it is worth using.

You have heard that you can “train” an AI model on your own data. But what does that actually mean, and how is it different from just writing a better prompt or connecting the model to your documents? Fine tuning is a specific, technical process, and understanding what it does and does not do saves a lot of expensive confusion.

The Short Version

Key Takeaways

  • Fine tuning is additional training that adjusts a model’s weights using new examples: it changes how the model behaves, not what it knows.
  • It shapes style, tone, classification behaviour, or task performance; it does not add fresh knowledge.
  • It is different from prompting, which gives the model instructions at run time.
  • It is different from RAG, which gives the model fresh information at run time.
  • It requires data, compute, and careful evaluation, and is not always the right answer.

What Fine Tuning Actually Does to a Model

A language model starts with a foundation: billions of parameters, adjusted during pre-training on. A large body of text, to predict what word or token is likely to come next. That pre-training process is expensive, slow, and done once by the model developer.

A second, targeted stage of training on a much smaller dataset follows. The model’s weights, which encode its behaviour and knowledge, are adjusted again based on the new examples. The result is a model that behaves differently from the base version. More likely to produce a certain style, more accurate at a specific task, more consistent about refusing or accepting particular kinds of content.

This is fundamentally different from giving the model instructions at the start of a conversation. Prompting tells the model what to do right now. This approach changes the underlying model itself, so the new behaviour is baked in rather than requested each time.

How Fine Tuning Differs from Prompting

Prompting works at the surface. You write a system prompt or a user message that tells the model to behave a certain way. Respond formally, always answer in bullet points, focus only on cooking questions. The model follows those instructions within that conversation, but the underlying model is unchanged.

This second process works at the level of the weights. You provide labelled examples of the behaviour you want, the model trains on them. The resulting model is structurally different from the one you started with.

The behaviour you trained in does not need to be requested each time. It is the model’s default.

Prompting is quicker, cheaper, and reversible. The process is slower, more expensive, and produces a persistent change. The trade-off is consistency and depth: additional training can produce behaviour that is more reliable and harder to override than a prompt alone.

How Fine Tuning Differs from RAG

RAG, or retrieval-augmented generation, connects a model to an external source of information at the moment of generation. When you ask a question, the system retrieves relevant documents, passes them into the model’s context, and the model uses that information to answer. RAG gives the model access to facts it was not trained on, without modifying the model itself.

Fine tuning does not make the model more knowledgeable about specific facts. It shapes how the model behaves: the tone it uses, the tasks it prioritises, the patterns it follows. If your goal is to have the model answer questions about your product documentation accurately, RAG is usually the better tool.

If your goal is to have the model always respond in a particular writing style. Produce outputs in a specific format, or perform a classification task reliably, additional training is more likely to help.

The distinction matters practically. People often reach for additional training when they want the model to “know” their company information, when RAG is what they actually need.

When Fine Tuning Is Worth Considering

Fine tuning tends to be worth the effort in a limited set of situations. OpenAI’s fine-tuning guide gives a clear technical overview of the requirements and when it makes commercial sense. The most common is when you need consistent behavioural change that prompting alone cannot reliably produce.

A model that classifies customer complaints into categories, for example, benefits from additional training. On examples of those categories, because the classification task is narrow and repeatable. Similarly, a model that needs to produce outputs in a specific format.

Or match a particular brand voice precisely, can be shaped more reliably through additional training than through prompting.

It is also worth considering when the prompt itself would be unwieldy. If achieving the right behaviour requires a very long, complex system prompt that consumes significant context, additional training can encode that behaviour more efficiently.

What the technique does not help with is knowledge. If you want the model to answer questions about events after its training cutoff. Or about documents it has never seen, additional training is the wrong tool. The model learns patterns and behaviours from examples, not raw facts.

The Limits of Fine Tuning

Fine tuning is not a cure for hallucination. A fine-tuned model can still produce confident, incorrect answers. Additional training on high-quality task examples reduces certain error rates within that task, but does not make the model reliably accurate in general.

It can also introduce new problems. If the training examples contain errors, biases, or a narrow view of the task, the fine-tuned model will reflect those problems. A model fine-tuned on customer service transcripts from one period may behave oddly if the company’s policies change and the examples are not updated.

There are also practical constraints. The process requires labelled examples in sufficient quantity and quality. The compute cost is higher than prompting.

The resulting model needs to be evaluated carefully, not just checked against the training data. And this approach locks in a model’s behaviour at a point in time. If the base model is updated, the fine-tuned version may not benefit from those improvements without a new round of training.

A Worked Example

Consider a company that handles a high volume of customer support requests by email. They want to use an AI model to draft initial replies.

The requests fall into a small number of categories: billing queries, technical problems, returns, and complaints. The company wants the model to always identify the category correctly. Respond in a specific formal tone that matches their brand, and include certain mandatory phrases in complaint responses.

Prompting alone produces inconsistent results. The model sometimes misclassifies edge cases, and the tone drifts depending on how the request is phrased.

With a few hundred labelled examples used for additional training, each showing the input request, the correct category. An approved draft reply, the model learns the classification patterns reliably and produces output that consistently matches the required format and tone.

For a separate task, looking up the customer’s order history to include in the reply, additional training is not the right tool here. That requires RAG or direct database access, because the model cannot be trained on individual customer records.

The distinction matters: the technique is appropriate for teaching the model how to respond, not what information to include about a specific customer or order.

What This Means For You

If you are evaluating AI tools for a specific task, additional training is one of several options, and often not the first one to try. Before considering this approach, prompting is almost always simpler and cheaper and should be explored thoroughly first. If the model can be made to behave the way you need through a. Well-written system prompt and examples within the context window, that is usually sufficient.

RAG is the right tool when you need the model to answer questions about documents, databases, or knowledge it was not trained on. Most business use cases that involve a company’s own information belong in this category.

Fine tuning makes sense when you have a specific, repeatable behaviour you need to train in. You have the labelled examples to do it properly, and when prompting has reached its limits.

In Plain English

Fine tuning is retraining. You take an existing model and train it further on your own examples, so its behaviour changes in a specific direction. Unlike prompting, which gives the model instructions each time, this approach bakes the new behaviour into the model itself. Unlike RAG, which gives the model information at run time, this technique shapes how the model behaves rather than what it knows.

It is a powerful tool for consistent, task-specific behaviour. It is the wrong tool for making a model more knowledgeable about fresh information.

Related Reads