Archives

2026⁴

April¹

[Summary] A Decoder-Only Foundation Model for Time-Series Forecasting

April 4, 2026 · 2 min · 400 words

March¹

[Summary] AutoResearch: Hard Constraints Enable Reliable Agentic ML

March 28, 2026 · 3 min · 498 words

February²

[Summary] Mitigating Hallucinations in Multimodal LLMs With Attention Causal Decoding

February 21, 2026 · 3 min · 521 words

[Summary] Mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformation

February 1, 2026 · 3 min · 532 words

2025²¹

December¹

[Summary] Perception Encoder: The best visual embeddings are not at the output of the network

December 28, 2025 · 4 min · 749 words

November²

[Summary] Why Less is More (Sometimes): A Theory of Data Curation

November 30, 2025 · 3 min · 554 words

How Transformers Learn Order: Absolute, Relative, and Rotary Positions

November 15, 2025 · 5 min · 1063 words

October³

From DETR to RF-DETR: The Evolution of End-to-End Object Detection

October 31, 2025 · 4 min · 756 words

[Summary] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

October 13, 2025 · 3 min · 584 words

[Summary] DINOv3: Self-Supervised Vision Transformers at Scale

October 11, 2025 · 5 min · 871 words

September¹

[Summary] Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

September 9, 2025 · 3 min · 626 words

August⁴

[Concept] Inside Transformer Attention

August 22, 2025 · 2 min · 418 words

[Summary] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

August 20, 2025 · 5 min · 907 words

[Summary] MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

August 11, 2025 · 3 min · 543 words

[Summary] From Reasoning to Super-Intelligence: A Search-Theoretic Perspective

August 7, 2025 · 5 min · 875 words

May¹

[Summary] Ada-R1: Hybrid CoT via Bi-Level Adaptive Reasoning Optimization

May 1, 2025 · 2 min · 372 words

April³

[Summary] LettuceDetect: A Hallucination Detection Framework for RAG Applications

April 25, 2025 · 2 min · 220 words

[Summary] On the Biology of a Large Language Model

April 12, 2025 · 2 min · 367 words

[Summary] VGGT: Visual Geometry Grounded Transformer

April 5, 2025 · 3 min · 479 words

March¹

[Summary] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

March 15, 2025 · 2 min · 333 words

February³

[Summary] Relightable Gaussian Codec Avatars

February 28, 2025 · 3 min · 606 words

[Summary] Training Vision Transformers with Only 2040 Images

February 15, 2025 · 2 min · 217 words

[Summary] ContraNorm: A Contrastive Learning Per-spective on Oversmoothing and beyond

February 1, 2025 · 2 min · 400 words

January²

[Summary] ReAct: Synergizing Reasoning and Acting in Language Models

January 17, 2025 · 1 min · 203 words

[Summary] Unifying Generative and Dense Retrieval for Sequential Recommendation

January 4, 2025 · 2 min · 367 words

2024¹⁴

November¹

[Summary] The Evolution of Multimodal Model Architectures

November 1, 2024 · 3 min · 427 words

October²

[Summary] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

October 18, 2024 · 2 min · 335 words

[Summary] Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

October 4, 2024 · 3 min · 457 words

August²

[Lecture notes] Algorithms and Hardness for Attention and Kernel Density Estimation

August 24, 2024 · 3 min · 514 words

[Summary] Vision Language Model are Blinds

August 17, 2024 · 2 min · 404 words

July¹

[Summary] Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

July 21, 2024 · 2 min · 387 words

June²

CVPR 2024 Summary

June 29, 2024 · 8 min · 1572 words

[Lecture notes] Let’s build the GPT Tokenizer

June 8, 2024 · 5 min · 926 words

May¹

[Summary] Semi-supervised Learning Made Simple with Self-supervised Clustering

May 14, 2024 · 2 min · 407 words

April²

[Summary] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

April 29, 2024 · 2 min · 335 words

[Summary] Object Recognition as Next Token Prediction

April 23, 2024 · 2 min · 267 words

March²

[Summary] Learning to Prompt for Vision-Language Models

March 22, 2024 · 2 min · 327 words

[Summary] Control Net: Adding Conditional Control to Text-to-Image Diffusion Models

March 2, 2024 · 2 min · 316 words

January¹

[Summary] RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

January 6, 2024 · 2 min · 422 words

2023⁵

December²

[Summary] Direct Preference Optimization (DPO)

December 23, 2023 · 2 min · 236 words

[Concept] Reinforcement learning from human feedback (RLHF)

December 9, 2023 · 2 min · 350 words

November¹

[Proof-of-Concept] DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

November 18, 2023 · 3 min · 437 words

October²

[Summary] CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

October 27, 2023 · 2 min · 361 words

[Summary] Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

October 14, 2023 · 1 min · 206 words

2026 4

April 1

[Summary] A Decoder-Only Foundation Model for Time-Series Forecasting

March 1

[Summary] AutoResearch: Hard Constraints Enable Reliable Agentic ML

February 2

[Summary] Mitigating Hallucinations in Multimodal LLMs With Attention Causal Decoding

[Summary] Mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformation

2025 21

December 1

[Summary] Perception Encoder: The best visual embeddings are not at the output of the network

November 2

[Summary] Why Less is More (Sometimes): A Theory of Data Curation

How Transformers Learn Order: Absolute, Relative, and Rotary Positions

October 3

From DETR to RF-DETR: The Evolution of End-to-End Object Detection

[Summary] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

[Summary] DINOv3: Self-Supervised Vision Transformers at Scale

September 1

[Summary] Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

August 4

[Concept] Inside Transformer Attention

[Summary] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

[Summary] MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

[Summary] From Reasoning to Super-Intelligence: A Search-Theoretic Perspective

May 1

[Summary] Ada-R1: Hybrid CoT via Bi-Level Adaptive Reasoning Optimization

April 3

[Summary] LettuceDetect: A Hallucination Detection Framework for RAG Applications

[Summary] On the Biology of a Large Language Model

[Summary] VGGT: Visual Geometry Grounded Transformer

March 1

[Summary] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

February 3

[Summary] Relightable Gaussian Codec Avatars

[Summary] Training Vision Transformers with Only 2040 Images

[Summary] ContraNorm: A Contrastive Learning Per-spective on Oversmoothing and beyond

January 2

[Summary] ReAct: Synergizing Reasoning and Acting in Language Models

[Summary] Unifying Generative and Dense Retrieval for Sequential Recommendation

2024 14

November 1

[Summary] The Evolution of Multimodal Model Architectures

October 2

[Summary] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

[Summary] Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

August 2

[Lecture notes] Algorithms and Hardness for Attention and Kernel Density Estimation

[Summary] Vision Language Model are Blinds

July 1

[Summary] Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

June 2

CVPR 2024 Summary

[Lecture notes] Let’s build the GPT Tokenizer

May 1

[Summary] Semi-supervised Learning Made Simple with Self-supervised Clustering

April 2

[Summary] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

[Summary] Object Recognition as Next Token Prediction

March 2

[Summary] Learning to Prompt for Vision-Language Models

[Summary] Control Net: Adding Conditional Control to Text-to-Image Diffusion Models

January 1

[Summary] RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

2023 5

December 2

[Summary] Direct Preference Optimization (DPO)

[Concept] Reinforcement learning from human feedback (RLHF)

November 1

[Proof-of-Concept] DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

October 2

[Summary] CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

[Summary] Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

2026⁴

April¹

March¹

February²

2025²¹

December¹

November²

October³

September¹

August⁴

May¹

April³

March¹

February³

January²

2024¹⁴

November¹

October²

August²

July¹

June²

May¹

April²

March²

January¹

2023⁵

December²

November¹

October²