How many training examples do I need for fine-tuning?

For LoRA fine-tuning, 500–5,000 high-quality examples are often enough. For full fine-tuning, 10,000–100,000 examples are typical. Quality beats volume: well-labeled, representative, consistently formatted data outperforms any dataset size.

How do I tell if fine-tuning or RAG fits?

Rule of thumb: RAG for knowledge, fine-tuning for behavior. If you need to provide frequently changing facts, RAG. If you want to enforce consistent tone, format, domain language or structured outputs, fine-tuning. Details in RAG, fine-tuning or prompt engineering.

Is fine-tuning worthwhile for a small business?

With LoRA: yes, often very much. Hardware requirements are moderate (one GPU suffices), and costs are a few hundred to a thousand euros per iteration. Preconditions: clearly defined task, your own example data, and an eval strategy.

How quickly does fine-tuning become outdated?

Behavioral adaptations (style, format, domain language) age slowly — often usable for years. Knowledge adaptations age fast because the model learns facts that change. The latter is the most common reason fine-tuning projects fail: knowledge was trained instead of behavior.

Do I need my own GPUs for fine-tuning?

No. Cloud providers like Hetzner, Together AI, Modal or Lambda Labs rent GPUs by the hour. A LoRA iteration typically costs 10–50 euros. If you work with sensitive data, choose German or European providers — see Secure AI integration.

How do I evaluate whether my fine-tuning succeeded?

With an eval suite defined before training. At least 30–100 real test cases, clear scoring criteria (rule-based or LLM-as-judge), side-by-side against the base model. Training without eval leaves you not knowing whether you got better or worse.

LLM Fine-Tuning: When Is It Really Worth It? (2026)

Fine-tuning sounds like sovereignty: your own model that speaks your language, knows your data, runs on your infrastructure. In practice, many fine-tuning projects fail — not on the tech, but on wrong expectations, bad data, and missing evaluation. This article shows when fine-tuning is really the right lever, and when other methods reach the goal cheaper and more reliably.

1. Why fine-tuning often overpromises

Fine-tuning is often pitched as a universal solution: “We train a model on your data, then it knows your business.” In reality, fine-tuning first means a lot of work, money, risk — and a set of prerequisites without which the result can’t be productively used.

The most common error is confusing knowledge with behavior. Fine-tuning is not the right way to teach a model facts — that’s what RAG is for. Fine-tuning is the right way to teach a model style, format and domain language. Confusing these burns budget.

2. What fine-tuning really changes

Fine-tuning modifies the weight matrix of an LLM — the internal parameters that drive output behavior. This change affects:

Style and tone. A model fine-tuned on legal texts writes in legally correct phrasing.
Format and structure. A model trained consistently on JSON output adheres to the format much more reliably.
Domain-specific vocabulary. Terms rare in pretraining are used more precisely.
Classification behavior. On tasks with defined categories, accuracy can be substantially raised.

What fine-tuning does not reliably change:

Factual knowledge in areas where the model saw little pretraining.
Context understanding over very long documents.
Truthfulness — fine-tuning does not eliminate hallucinations.

3. Five hard prerequisites

From our consulting practice: if any of these is missing, fine-tuning should be postponed.

Clearly defined use case. One task, one input shape, one expected output. “Better answers” is not a use case.
Your own data in sufficient quality and quantity. At least several hundred high-quality examples, cleanly labeled, consistently formatted.
Eval suite before training. At least 30 real test cases with scoring criteria. Without eval, fine-tuning is blind flight.
Hardware plan. GPU access (local or cloud), defined budgets, privacy posture clarified.
Engineering discipline. Reproducible pipelines, versioned datasets, logging. A bash script doesn’t cut it.

4. What does fine-tuning realistically cost?

Three cost drivers:

Data work. The biggest line item — typically 60–80% of total effort. Collecting, cleaning, labeling, validating training data.
GPU hours. LoRA on 8B models: 10–100 euros per iteration. Full fine-tuning on 70B models: 1,000–10,000 euros per iteration. Multiple iterations are the rule.
Eval and deployment. Building the eval suite, packaging the model, setting up monitoring, defining rollback. Time, not hardware.

A first productive LoRA iteration is realistic at 5,000–25,000 euros total. Full fine-tuning of a 70B model with curated dataset and eval lands closer to 80,000–300,000 euros. The math only works if the added value is clearly measurable. If you lack enough examples of your own, you can deliberately extend the dataset with synthetic training data.

5. Fine-tuning vs. RAG

A pragmatic heuristic:

If the problem is “answers wrongly because knowledge is missing” → RAG.
If the problem is “answers stylistically wrong or formally inconsistent” → fine-tuning.
If both apply → try RAG first, add fine-tuning where needed.

In most enterprise setups RAG solves 60–80% of the issues originally labeled “fine-tuning needed.” Only the remaining 20–40% justify fine-tuning’s effort. See also RAG, fine-tuning or prompt engineering.

6. Risks and pitfalls

Bad data quality. Fine-tuning on inconsistent, faulty or biased data produces a model that systematically reproduces those flaws. More in Why AI projects fail.
Catastrophic forgetting. Especially in full fine-tuning, the model loses general capabilities. LoRA mitigates this risk.
Overfitting. Too small or too homogeneous datasets cause the model to memorize the training data and fail on new examples.
Becoming outdated. Facts correct today are outdated next year. Fine-tuning on such content locks you into a re-training cycle.
License and compliance risk. Open-weight models have different licenses (Llama, Apache, MIT, restricted). Clarify before fine-tuning whether the intended use is allowed.

7. Three hard decision rules

Crystallized from 50+ consulting cases:

No fine-tuning investment without an eval suite. Eval is not optional. Without it, success isn’t measurable — see how to build one in Guardrails, evals and prompt injection.
LoRA before full fine-tuning. LoRA is enough in 90% of cases. Full fine-tuning is the exception. Details in LoRA explained.
RAG before fine-tuning. When both seem viable, RAG first — cheaper, more flexible, more maintainable.

Fine-tuning in 2026 is not a secret weapon but a precise tool for a clearly bounded task. Used as a universal solution it burns budget. Used surgically, where preconditions hold, it builds a real competitive advantage — a model that speaks your language, runs on your infrastructure, and improves iteration by iteration. The path leads through clean data, sharp use cases, and a mature eval pipeline — not through vendor pitches.

Fine-Tuning LLMs: When Is It Worth Adapting a Model?