Field Guide

Inside the Forward Pass

Type a sentence and watch a language model read it — the way it actually does, one token at a time. The tokenizer below is the real GPT-2, running entirely in your browser.

STEP 01

Tokens Real GPT-2

A model never sees words. It sees tokens — chunks of bytes from a fixed vocabulary of 50,257. Common words are one token; rare ones shatter into pieces. The leading space is part of the token (that is why ·is differs from is). Type anything:

Loading the GPT-2 tokenizer…

This is the actual GPT-2 byte-pair encoding (the 1.5 MB vocabulary loads once, then runs locally — no server, nothing sent anywhere). Try emoji, code, numbers, or another language and watch it fragment.

STEP 02

Embeddings Schematic

Each token id is swapped for a learned vector of 768 numbers — its meaning, written as coordinates. Here are the first 24 dimensions of your first few tokens.

Illustrative values, derived from each token id so they react to your input. Real embeddings are learned during training; the shape — one dense vector per token — is exact.

STEP 03

Attention Schematic

Every token looks back at the tokens before it and decides how much each one matters. The grid is read row by row: row i shows how token i distributes its attention over tokens 1…i.

The weights are illustrative — but the lower-triangular shape is exactly right: a token can attend to the past, never the future. That single rule is what makes generation work left to right.

What happens after attention — position signals, the MLP, twelve stacked layers, and the projection onto 50,257 candidate next-tokens — is walked through in full in the essay this guide belongs to: How Machines Think →

← Back