LLM Technique Selection: Prompting, Tools, RAG, LoRA, and Fine-Tuning

Most LLM discussions get noisy because people treat every technique like a badge of sophistication. I use a simpler rule: choose the lightest technique that solves the real problem with acceptable reliability.

The Order I Would Evaluate

That order is intentional. Each step increases complexity, cost, and operational burden. The burden is only worth it if the simpler layer is clearly insufficient.

Prompting

Use it when: the model already knows enough and you mainly need better framing, role clarity, or response style.

Good for: summarization, drafting, classification, light transformation.

Limit: prompting does not create new knowledge or guaranteed structure by itself.

Structured Outputs + Validation

Use it when: downstream systems need reliable fields, bounded enums, or machine-readable artifacts.

Good for: extraction, routing, workflow state, artifact generation.

Limit: the model can still drift, so the validator becomes the real trust anchor.

Tool Use

Use it when: the model must trigger real actions, fetch live data, or hand work to deterministic systems.

Good for: API calls, search, database actions, file operations.

Limit: tool access expands blast radius, so permissions and guardrails matter more than prompt cleverness.

RAG (Retrieval-Augmented Generation)

Use it when: the problem is missing or changing knowledge, not model behavior.

Good for: policy lookup, product knowledge, document-grounded answers, changing source material.

Limit: bad retrieval gives bad answers; retrieval quality becomes part of the system design problem.

LoRA / QLoRA

Use it when: you need to adapt a base model to a narrower domain behavior or style and prompting is not enough.

Good for: repeatable domain phrasing, narrower task specialization, local adaptation without full retraining.

Limit: it adds training, evaluation, versioning, and drift-management overhead. QLoRA reduces compute cost by fine-tuning a quantized base model, but it does not remove the operational burden.

Full Fine-Tuning

Use it when: the business case is strong enough to justify the highest cost, control, and evaluation burden.

Good for: specialized, repeated workloads where adaptation itself becomes a product capability.

Limit: it is the most expensive option in both engineering time and governance complexity.

Where LoRA and QLoRA Actually Fit

LoRA and QLoRA are not first-line architecture choices. They are model adaptation techniques. That means they become relevant only after you have already learned something important: the base model plus prompting, retrieval, and validation still do not meet the requirement.

In other words, LoRA is not a substitute for system design. It is one possible layer inside a larger system.

What I Would Prioritize In Real Work

In most business workflows, the highest return usually comes from better contracts, cleaner inputs, stronger validators, and better task boundaries before it comes from custom model training.