[Lecture notes] Algorithms and Hardness for Attention and Kernel Density Estimation

TL;DR Kernel Density Estimation (KDE) is a statistical technique with applications across various fields, such as estimating the distribution of a random variable and computing the attention layer in Transformers. While the standard algorithm for KDE has a quadratic time complexity, this presentation introduces two advanced techniques (the polynomial method and the Fast Multipole Method) that reduce the computation time to nearly linear in certain cases. KDE problem formulation Inputs....

August 24, 2024 · 3 min · 514 words

[Lecture notes] Let's build the GPT Tokenizer

Andrej Karpathy has released a great series of in-depth-hands-on of building GPT models. Here are my notes taken during watching the “Let’s build the GPT Tokenizer” video. What are Tokens? Large Language Models (LLM) don’t process the raw text directly. They use tokens are the out of the Tokenization process which translates text into sequence of tokens. Many issues of LLMs are mainly due to tokenization: LLMs are bad at simple arithmetic....

June 8, 2024 · 5 min · 926 words