FAQ

Frequently Asked Questions

Quick answers to common questions about the demos, local models, and how everything runs in your browser.

Private by default

Prompts stay on your device. Local inference doesn’t leave the browser.

WebGPU-ready

Local models use WebGPU when available for real-time interaction.

Hands-on learning

Explore tokens, throughput, sampling, and more—visually.

A collection of interactive, in-browser demos that help you understand how large language models (LLMs) work through touchable visuals.

Yes. Everything runs locally in your browser. No server round-trips are required for the demos.

No account or sign-up required—just open a demo and play.

Modern Chromium-based browsers (Chrome/Edge) work best. For local models, WebGPU support is recommended. Safari (macOS) can work with recent versions; Firefox is improving support.

Yes. Prompts stay on your device. When a local model is used, the weights are cached in your browser; nothing is uploaded.

The browser downloads quantized model shards on first use and caches them. Subsequent loads are much faster.

Tiny local models trade fluency for speed. Tune Temperature (lower for stability), Top-p (0.8–0.95), and increase Max Tokens for longer answers.

It mimics streaming pace. Faster TPS feels more responsive even when total time is similar—great for designing perceived latency.

Yes—if you host compatible WebLLM model files with proper CORS and range requests enabled. Then point the demo to that model record.

Once the page and (optionally) model weights are cached, most demos continue to work offline.

Still have a question?

Check out the demos for hands-on answers—or open an issue on GitHub.