[Summary] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

TL;DR State-of-the-art language models are primarily decoder-only, focusing on token prediction rather than producing rich contextualized embeddings for downstream tasks. LLM2Vec introduces an unsupervised method to transform decoder-only models into encoders. This approach involves: (i) enabling bidirectional attention, (ii) training on masked token prediction, and (iii) incorporating unsupervised contrastive learning. The result is that these converted models outperform traditional encoder-only models. Background Until recently, large language models (LLMs) were predominantly based on bidirectional encoders or encoder-decoder frameworks like BERT and T5....

October 18, 2024 · 2 min · 335 words

CVPR 2024 Summary

Last week I attended the CVPR conference, a gathering of computer vision researchers and professionals showcasing the latest advancements in the field. Some interesting recent trends: Multimodal models and datasets Large Language Models (LLMs) are being used to train vision models Images are used to ground LLMs, reducing their hallucination Models are being fed with both images and videos to achieve better results Foundation models are commodity These models are becoming more accessible and less expensive to create They are trained on multiple modalities and tasks (even for a very niche tasks like hand pose estimation) Transformers are everywhere: While not a new trend, it’s still notable that attention mechanisms are incorporated into almost every model....

June 29, 2024 · 8 min · 1572 words

[Summary] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

TL;DR Generative Large Language Models (LLMs) are limited to generate text based on their training data which means any extension to additional sources necessitates additional training. Retrieval Augmented Generation (RAG) is a method that combines the use of a database with LLMs enables updating the LLM knowledge and make it more precise for specific applications. Method Building blocks The method consists of 3 building blocks. Document index. A pre-trained model was used to encode documents into embeddings to create the index....

April 29, 2024 · 2 min · 335 words

[Summary] Direct Preference Optimization (DPO)

TL;DR Direct Preference Optimization is a method of fine-tuning Large Language Models (LLM) to better align their outputs with human preference. It’s used as a simpler alternative to RLHF since it can be directly applied to the model without needing a reward function nor reinforcement learning optimization. Method The authors propose to re-parameterize the reward model of RLHF to obtain the optimal policy in closed form. This enables to solve the standard RLHF problem using a simple classification loss....

December 23, 2023 · 2 min · 236 words

[Concept] Reinforcement learning from human feedback (RLHF)

TL;DR Machine learning models require a loss function to tune their parameters. Designing a loss function to reflect ambiguous human values poses a challenge, e.g., it’s not clear how to formulate a loss function to represent what is funny or ethical. To this end, a reward model is trained via human feedback. This reward model takes the model’s output and predicts a reward score that is then used by the model to optimize its parameters....

December 9, 2023 · 2 min · 350 words