Explain It Like I'm Five: AI tokens and context windows

What are AI tokens?

The smallest units of information that AI breaks words and sentences into to make them easier to process. How many tokens an AI can process at once is called a “context window,” and it can include multiple prompts and requests, letting a system consider several things you might have told it at once.

How big is a token?

Each model “tokenizes” data differently, but OpenAI has a tool that shows how ChatGPT turns words into tokens. It can also vary word to word: “It” and “information” are both one token, but “weird” is two and “encyclopedia” is three. Words also aren’t broken down exactly by character or syllable — ChatGPT breaks “encyclopedia” into “ency-c-lopedia.”

Why break info down like this?

Remember when you first learned to read? You would sound out words, then put them together into something you recognize. Eventually, you would recognize some words just by looking at them, but you would still sound out new ones. AI does something similar, but since it uses math to understand language, words get split up in ways that seem odd to us.

How big are the context windows I might have seen?

The context window on the current version of GPT-3.5 (ChatGPT’s free model) is 16,385 tokens, while GPT-4 can handle up to 128,000 tokens. Google’s Gemini Pro has a 32,000 token context window, but the company is working on a version it says can handle a million tokens.

So a bigger context window means a better AI?

Not exactly. A bigger context window means an AI can process more data at once, but that doesn’t mean it will do it well.