Encoders and Decoders in LLMs

Basics again.

I think I must have heard it so many times: Claude, ChatGPT, and co are decoder-only models.

Cool. But really, what does that mean?

Decoder: Reads only what came before the current last token, never after. One token at a time. After prediction, re-feeds its own output back in.

Encoder: Reads the source, builds a representation, writes target token by token. Great for translation, used when input and ouput are distinct sequences: read it all in English and translate to French. Everyone who speaks more than one language understands one by one translation does not work.

Perpetually Incomplete

Recent Notes

2026-05-05

Model Distillation [In Progress]

2026-05-04

Explorer

Encoders and Decoders in LLMs

Recent Notes

2026-05-05

Model Distillation [In Progress]

2026-05-04

Graph View