[Concept] Inside Transformer Attention

Attention Layer Attention blocks are the backbone of the Transformer architecture, enabling the model to capture dependencies across the input sequence. An attention layer takes as input: A query vector \(q \in \mathbb{R}^d\) A matrix of keys \(K \in \mathbb{R}^{n \times d}\) (rows are \(k_i^\top\)) A matrix of values \(V \in \mathbb{R}^{n \times d_v}\) In the vanilla Transformer setup, the query, key, and value come from the same token embedding \(x\) but the model is free to learn different subspaces for “asking” (queries), “addressing” (keys), and “answering” (values):...

August 22, 2025 · 2 min · 418 words

[Concept] Reinforcement learning from human feedback (RLHF)

TL;DR Machine learning models require a loss function to tune their parameters. Designing a loss function to reflect ambiguous human values poses a challenge, e.g., it’s not clear how to formulate a loss function to represent what is funny or ethical. To this end, a reward model is trained via human feedback. This reward model takes the model’s output and predicts a reward score that is then used by the model to optimize its parameters....

December 9, 2023 · 2 min · 350 words