Glossary

Key terms and concepts in transformer-based LLMs

About This Project

Langsplain is an interactive educational tool designed to help you understand how modern Large Language Models work under the hood.

What You'll Learn

Architecture: tokenization, embeddings, attention, FFN/MOE, and output projection
Training: data preparation, loss optimization, backpropagation, and post-training alignment
Inference: prefill, KV cache, sampling, decode loops, and stop conditions

Interactive Features

Guided Tour: Walkthrough across Architecture, Training, and Inference
Section-Specific Diagrams: Click any component to open detailed explanations
Attention Demo: Visualize how tokens attend to each other
MOE Demo: See how routing works in expert models
Gradient Demo: Step through optimization on a 2D loss surface
Loss Demo: Watch cross-entropy and perplexity change during training
Sampling Demo: Explore temperature, top-k, and top-p effects
KV Cache Demo: Compare cached vs uncached generation cost
Generation Demo: Step through prefill + autoregressive decode

Further Learning

Attention Is All You Need - The original transformer paper
The Illustrated Transformer - Visual guide by Jay Alammar
Switch Transformers - MOE at scale
Neural Networks: Zero to Hero - Andrej Karpathy's course

Technical Notes

This visualization uses simplified, toy-sized models for demonstration purposes. Real LLMs have much larger dimensions (e.g., 4096-8192 vs our 64) and more layers (32-96 vs our 3). The attention patterns shown are computed on actual (tiny) weights but won't match production model behavior.

Credits

Built with vanilla JavaScript, D3.js for visualizations, and Anime.js for animations. No framework dependencies - just clean, educational code.

Glossary

About This Project

What You'll Learn

Interactive Features

Further Learning

Technical Notes

Credits

Attention Visualization

Mixture of Experts

Sampling Strategies

KV Cache Visualization

Gradient Descent

Loss and Learning

Autoregressive Generation