Retrieve • Cite • Fit into the context window
RAG adds the right snippets of your notes to the model’s input so it answers with sources. You’ll see which snippets are pulled and how they fit into the model’s memory (context window).
Will it fit in the model’s memory?
- [1] Sampling: Temperature vs Top-p
temperature scales logits and controls randomness; lower values yield more deterministic outputs. top-p restricts sampling to a nucleus of the probability mass. combine low temperature with top-p around 0.9 for clear, co…
- [2] Latency & Streaming
perceived speed matters. streaming tokens early makes apps feel faster even if total time is similar. use higher tokens-per-second to improve cadence and user trust. show progress as soon as possible.
- [3] RAG Chunking Tips
chunk by semantic units like sections or paragraphs. use overlaps to preserve context at boundaries. good chunk titles boost retrieval quality. keep chunks compact to fit within the model's context window.
SYSTEM: You are a helpful assistant. Use the CONTEXT to answer the QUESTION in one or two short sentences. QUESTION: In one sentence, explain the difference between temperature and top-p. CONTEXT: [1] Sampling: Temperature vs Top-p temperature scales logits and controls randomness; lower values yield more deterministic outputs. top-p restricts sampling to a nucleus of the probability mass. combine low temperature with top-p around 0.9 for clear, concise answers. [2] Latency & Streaming perceived speed matters. streaming tokens early makes apps feel faster even if total time is similar. use higher tokens-per-second to improve cadence and user trust. show progress as soon as possible. [3] RAG Chunking Tips chunk by semantic units like sections or paragraphs. use overlaps to preserve context at boundaries. good chunk titles boost retrieval quality. keep chunks compact to fit within the model's context window.
- • Good answers come from good snippets — tidy notes help.
- • The model has a memory limit (context window). Keep within it.
- • We cite a few top snippets instead of your entire notes.