RAG vs Post-Training — visualised

Flexible RAG pipeline vs retraining a big LLM

RAG adds more moving parts, but each one is swappable. Over time that modularity means lower cost, faster iteration, and no constant retrains.

Open the flowchart See the comparison

RAG pipeline

Modular. Swap parts without touching the model.

Swappable module Fast to iterate Lower ongoing cost

Swappable

Ingest & chunk

repo/docs → chunks

Swappable

Embed

text → vectors

Swappable

Vector store

HNSW / IVF / disk

Swappable

Rerank (optional)

cross-encoder

Swappable

Tools / Guards

PII, policy, fmt

LLM generate

grounded answer

Swappable

Telemetry & eval

closed loop

Swappable

Ingest & chunk

repo/docs → chunks

Swappable

Embed

text → vectors

Swappable

Vector store

HNSW / IVF / disk

Swappable

Rerank (optional)

cross-encoder

Swappable

Tools / Guards

PII, policy, fmt

LLM generate

grounded answer

Swappable

Telemetry & eval

closed loop

Swap embedder (e.g., open-source ↔ hosted)

Swap chunking/rules per repo

Swap vector DB without touching prompts

Re-index only changed files

Post-training a large LLM

Heavy process. Costly each time data changes.

Big one-off cost GPU/IO heavy Repeat when data updates

Curate data

label/clean/dedup

Preprocess

tokenise/pack

Fine-/post-train

SFT/LoRA/QLoRA

Evaluate

benchmarks/red-team

Package & ship

weights/artifacts

Serve & monitor

autoscale, logs

Curate data

label/clean/dedup

Preprocess

tokenise/pack

Fine-/post-train

SFT/LoRA/QLoRA

Evaluate

benchmarks/red-team

Package & ship

weights/artifacts

Serve & monitor

autoscale, logs

Any data change → re-run training loop

High recurring GPU/storage cost

Total cost over time

RAG

Retrain cycles

RAG shifts cost to retrieval and storage; you avoid repeated training runs. For fast-changing code/docs, this compounds.

Change velocity

RAG: update index or swap embedder/vector DB → live in minutes.
Post-train: re-curate data → schedule GPUs → re-train → re-eval → redeploy.

Risk & blast radius

RAG: issues are isolated per module (chunking, embedder, reranker).
Post-train: model weights change → system-wide behavioural shift.

Team Cost Comparison — 30 developers

Rough, adjustable numbers to compare RAG, Post-Training, and GitHub Copilot across Local vs Cloud operation. Edit assumptions on the left; totals update live.

These are directional; change inputs to match your stack and usage.

Team size

$ / kWh

$ / dev-day

RAG — Assumptions

Servers (4090-class)Server cost (each)Watts per serverHours/day active

Hardware is a ONE-OFF cost

Dev time

Setup days (one-off)Maintenance days / month

Post-Training — Assumptions

Servers (L40S-class)Server cost (each)Watts per serverHours/day active

Retraining (local)

Runs / monthHours / runGPUs / runWatts / GPU

Hardware is a ONE-OFF cost

Dev time

Setup days (one-off)Maintenance days / month

RAG — Costs (One-off vs Monthly)

ONE-OFF (CapEx)

Hardware purchase$12000

Dev setup$4000

ONE-OFF TOTAL$16000

One-off cost (not monthly)

MONTHLY (OpEx)

Electricity$49.50

Cloud infra (tokens+DB)$0.000

Training$0.000

Dev maintenance$400

MONTHLY TOTAL$450

Lowest monthly recurring

Post-Training — Costs (One-off vs Monthly)

ONE-OFF (CapEx)

Hardware purchase$48000

Dev setup$12000

ONE-OFF TOTAL$60000

One-off cost (not monthly)

MONTHLY (OpEx)

Electricity$154

Cloud inference$0.000

Training$14.00

Dev maintenance$800

MONTHLY TOTAL$968

GitHub Copilot — Costs (One-off vs Monthly)

ONE-OFF (CapEx)

Hardware purchase$0.000

Dev setup$0.000

ONE-OFF TOTAL$0.000

MONTHLY (OpEx)

Seats$570

Electricity$0.000

Training$0.000

Dev maintenance$0.000

MONTHLY TOTAL$570

Assumes 22 work days/month. Update tokens, prices, and power to your real figures.
Hardware costs are ONE-OFF purchases (CapEx). They do not recur monthly; electricity and cloud usage do.
Dev time is split into Setup (one-off) and Maintenance (monthly) for each approach.
The green badge highlights the option with the lowest monthly recurring cost (OpEx).