RAG vs Post-Training — visualised
Flexible RAG pipeline vs retraining a big LLM
RAG adds more moving parts, but each one is swappable. Over time that modularity means lower cost, faster iteration, and no constant retrains.
RAG pipeline
Modular. Swap parts without touching the model.
Swappable module Fast to iterate Lower ongoing cost
Swappable
Ingest & chunk
repo/docs → chunks
Swappable
Embed
text → vectors
Swappable
Vector store
HNSW / IVF / disk
Swappable
Rerank (optional)
cross-encoder
Swappable
Tools / Guards
PII, policy, fmt
LLM generate
grounded answer
Swappable
Telemetry & eval
closed loop
Swappable
Ingest & chunk
repo/docs → chunks
Swappable
Embed
text → vectors
Swappable
Vector store
HNSW / IVF / disk
Swappable
Rerank (optional)
cross-encoder
Swappable
Tools / Guards
PII, policy, fmt
LLM generate
grounded answer
Swappable
Telemetry & eval
closed loop
Swap embedder (e.g., open-source ↔ hosted)
Swap chunking/rules per repo
Swap vector DB without touching prompts
Re-index only changed files
VS
Post-training a large LLM
Heavy process. Costly each time data changes.
Big one-off cost GPU/IO heavy Repeat when data updates
Curate data
label/clean/dedup
Preprocess
tokenise/pack
Fine-/post-train
SFT/LoRA/QLoRA
Evaluate
benchmarks/red-team
Package & ship
weights/artifacts
Serve & monitor
autoscale, logs
Curate data
label/clean/dedup
Preprocess
tokenise/pack
Fine-/post-train
SFT/LoRA/QLoRA
Evaluate
benchmarks/red-team
Package & ship
weights/artifacts
Serve & monitor
autoscale, logs
Any data change → re-run training loop
High recurring GPU/storage cost
Total cost over time
RAG
Retrain cycles
RAG shifts cost to retrieval and storage; you avoid repeated training runs. For fast-changing code/docs, this compounds.
Change velocity
- RAG: update index or swap embedder/vector DB → live in minutes.
- Post-train: re-curate data → schedule GPUs → re-train → re-eval → redeploy.
Risk & blast radius
- RAG: issues are isolated per module (chunking, embedder, reranker).
- Post-train: model weights change → system-wide behavioural shift.
Team Cost Comparison — 30 developers
Rough, adjustable numbers to compare RAG, Post-Training, and GitHub Copilot across Local vs Cloud operation. Edit assumptions on the left; totals update live.
These are directional; change inputs to match your stack and usage.
Team size
$ / kWh
$ / dev-day
RAG — Assumptions
Hardware is a ONE-OFF cost
Dev time
Post-Training — Assumptions
Retraining (local)
Hardware is a ONE-OFF cost
Dev time
RAG — Costs (One-off vs Monthly)
ONE-OFF (CapEx)
Hardware purchase$12000
Dev setup$4000
ONE-OFF TOTAL$16000
One-off cost (not monthly)
MONTHLY (OpEx)
Electricity$49.50
Cloud infra (tokens+DB)$0.000
Training$0.000
Dev maintenance$400
MONTHLY TOTAL$450
Lowest monthly recurring
Post-Training — Costs (One-off vs Monthly)
ONE-OFF (CapEx)
Hardware purchase$48000
Dev setup$12000
ONE-OFF TOTAL$60000
One-off cost (not monthly)
MONTHLY (OpEx)
Electricity$154
Cloud inference$0.000
Training$14.00
Dev maintenance$800
MONTHLY TOTAL$968
GitHub Copilot — Costs (One-off vs Monthly)
ONE-OFF (CapEx)
Hardware purchase$0.000
Dev setup$0.000
ONE-OFF TOTAL$0.000
MONTHLY (OpEx)
Seats$570
Electricity$0.000
Training$0.000
Dev maintenance$0.000
MONTHLY TOTAL$570
- Assumes 22 work days/month. Update tokens, prices, and power to your real figures.
- Hardware costs are ONE-OFF purchases (CapEx). They do not recur monthly; electricity and cloud usage do.
- Dev time is split into Setup (one-off) and Maintenance (monthly) for each approach.
- The green badge highlights the option with the lowest monthly recurring cost (OpEx).