Mini RAG Lab

Retrieve • Cite • Fit into the context window

RAG adds the right snippets of your notes to the model’s input so it answers with sources. You’ll see which snippets are pulled and how they fit into the model’s memory (context window).

Will it fit in the model’s memory?

0 / 8,192 tokens
Included snippets
  1. [1] Sampling: Temperature vs Top-p

    temperature scales logits and controls randomness; lower values yield more deterministic outputs. top-p restricts sampling to a nucleus of the probability mass. combine low temperature with top-p around 0.9 for clear, co…

  2. [2] Latency & Streaming

    perceived speed matters. streaming tokens early makes apps feel faster even if total time is similar. use higher tokens-per-second to improve cadence and user trust. show progress as soon as possible.

  3. [3] RAG Chunking Tips

    chunk by semantic units like sections or paragraphs. use overlaps to preserve context at boundaries. good chunk titles boost retrieval quality. keep chunks compact to fit within the model's context window.

What the model will see
SYSTEM:
You are a helpful assistant. Use the CONTEXT to answer the QUESTION in one or two short sentences.

QUESTION:
In one sentence, explain the difference between temperature and top-p.

CONTEXT:
[1] Sampling: Temperature vs Top-p
temperature scales logits and controls randomness; lower values yield more deterministic outputs. top-p restricts sampling to a nucleus of the probability mass. combine low temperature with top-p around 0.9 for clear, concise answers.

[2] Latency & Streaming
perceived speed matters. streaming tokens early makes apps feel faster even if total time is similar. use higher tokens-per-second to improve cadence and user trust. show progress as soon as possible.

[3] RAG Chunking Tips
chunk by semantic units like sections or paragraphs. use overlaps to preserve context at boundaries. good chunk titles boost retrieval quality. keep chunks compact to fit within the model's context window.
Why this matters
  • • Good answers come from good snippets — tidy notes help.
  • • The model has a memory limit (context window). Keep within it.
  • • We cite a few top snippets instead of your entire notes.