Decision Framework
Most LLM discussions get noisy because people treat every technique like a badge of sophistication. I use a simpler rule: choose the lightest technique that solves the real problem with acceptable reliability.
Last reviewed: March 2026
That order is intentional. Each step increases complexity, cost, and operational burden. The burden is only worth it if the simpler layer is clearly insufficient.
Use it when: the model already knows enough and you mainly need better framing, role clarity, or response style.
Good for: summarization, drafting, classification, light transformation.
Limit: prompting does not create new knowledge or guaranteed structure by itself.
Use it when: downstream systems need reliable fields, bounded enums, or machine-readable artifacts.
Good for: extraction, routing, workflow state, artifact generation.
Limit: the model can still drift, so the validator becomes the real trust anchor.
Use it when: the model must trigger real actions, fetch live data, or hand work to deterministic systems.
Good for: API calls, search, database actions, file operations.
Limit: tool access expands blast radius, so permissions and guardrails matter more than prompt cleverness.
Use it when: the problem is missing or changing knowledge, not model behavior.
Good for: policy lookup, product knowledge, document-grounded answers, changing source material.
Limit: bad retrieval gives bad answers; retrieval quality becomes part of the system design problem.
Use it when: you need to adapt a base model to a narrower domain behavior or style and prompting is not enough.
Good for: repeatable domain phrasing, narrower task specialization, local adaptation without full retraining.
Limit: it adds training, evaluation, versioning, and drift-management overhead. QLoRA reduces compute cost by fine-tuning a quantized base model, but it does not remove the operational burden.
Use it when: the business case is strong enough to justify the highest cost, control, and evaluation burden.
Good for: specialized, repeated workloads where adaptation itself becomes a product capability.
Limit: it is the most expensive option in both engineering time and governance complexity.
LoRA and QLoRA are not first-line architecture choices. They are model adaptation techniques. That means they become relevant only after you have already learned something important: the base model plus prompting, retrieval, and validation still do not meet the requirement.
In other words, LoRA is not a substitute for system design. It is one possible layer inside a larger system.
In most business workflows, the highest return usually comes from better contracts, cleaner inputs, stronger validators, and better task boundaries before it comes from custom model training.