[Summary] Training Vision Transformers with Only 2040 Images

TL;DR Vision Transformers (ViTs) outperform Convolutional Neural Networks (CNNs) with sufficient data but are data-hungry, limiting their use with small datasets. The authors propose a method to train ViTs with limited data by pre-training with label smoothing, lower resolution images, and parametric instance discrimination, followed by fine-tuning on the target task. Method Training a Vision Transformer on small datasets involves two steps Self-supervised pretraining: Parametric instance discrimination: Classify each image as its own class....

February 15, 2025 · 2 min · 217 words

[Summary] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

TL;DR State-of-the-art language models are primarily decoder-only, focusing on token prediction rather than producing rich contextualized embeddings for downstream tasks. LLM2Vec introduces an unsupervised method to transform decoder-only models into encoders. This approach involves: (i) enabling bidirectional attention, (ii) training on masked token prediction, and (iii) incorporating unsupervised contrastive learning. The result is that these converted models outperform traditional encoder-only models. Background Until recently, large language models (LLMs) were predominantly based on bidirectional encoders or encoder-decoder frameworks like BERT and T5....

October 18, 2024 · 2 min · 335 words

[Summary] Semi-supervised Learning Made Simple with Self-supervised Clustering

TL;DR In self-supervised learning there are no guarantees that representations will organize the clusters according to their semantic classes. When labels are partially available the authors propose to replace the cluster centroids with class prototypes learned with supervision. In this way, unlabeled samples will be clustered around the class prototypes, guided by the self-supervised clustering-based objective. Method The method trains a model by jointly optimize a supervised loss on labeled data and a self-supervised loss on unlabeled data using the same loss function (cross-entropy)....

May 14, 2024 · 2 min · 407 words