2025  12

August  2

[Summary] MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

August 11, 2025 · 3 min · 543 words

[Summary] From Reasoning to Super-Intelligence: A Search-Theoretic Perspective

August 7, 2025 · 5 min · 875 words

May  1

[Summary] Ada-R1: Hybrid CoT via Bi-Level Adaptive Reasoning Optimization

May 1, 2025 · 2 min · 372 words

April  3

[Summary] LettuceDetect: A Hallucination Detection Framework for RAG Applications

April 25, 2025 · 2 min · 220 words

[Summary] On the Biology of a Large Language Model

April 12, 2025 · 2 min · 367 words

[Summary] VGGT: Visual Geometry Grounded Transformer

April 5, 2025 · 3 min · 479 words

March  1

[Summary] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

March 15, 2025 · 2 min · 333 words

February  3

[Summary] Relightable Gaussian Codec Avatars

February 28, 2025 · 3 min · 606 words

[Summary] Training Vision Transformers with Only 2040 Images

February 15, 2025 · 2 min · 217 words

[Summary] ContraNorm: A Contrastive Learning Per-spective on Oversmoothing and beyond

February 1, 2025 · 2 min · 400 words

January  2

[Summary] ReAct: Synergizing Reasoning and Acting in Language Models

January 17, 2025 · 1 min · 203 words

[Summary] Unifying Generative and Dense Retrieval for Sequential Recommendation

January 4, 2025 · 2 min · 367 words

2024  14

November  1

[Summary] The Evolution of Multimodal Model Architectures

November 1, 2024 · 3 min · 427 words

October  2

[Summary] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

October 18, 2024 · 2 min · 335 words

[Summary] Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

October 4, 2024 · 3 min · 457 words

August  2

[Lecture notes] Algorithms and Hardness for Attention and Kernel Density Estimation

August 24, 2024 · 3 min · 514 words

[Summary] Vision Language Model are Blinds

August 17, 2024 · 2 min · 404 words

July  1

[Summary] Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

July 21, 2024 · 2 min · 387 words

June  2

CVPR 2024 Summary

June 29, 2024 · 8 min · 1572 words

[Lecture notes] Let’s build the GPT Tokenizer

June 8, 2024 · 5 min · 926 words

May  1

[Summary] Semi-supervised Learning Made Simple with Self-supervised Clustering

May 14, 2024 · 2 min · 407 words

April  2

[Summary] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

April 29, 2024 · 2 min · 335 words

[Summary] Object Recognition as Next Token Prediction

April 23, 2024 · 2 min · 267 words

March  2

[Summary] Learning to Prompt for Vision-Language Models

March 22, 2024 · 2 min · 327 words

[Summary] Control Net: Adding Conditional Control to Text-to-Image Diffusion Models

March 2, 2024 · 2 min · 316 words

January  1

[Summary] RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

January 6, 2024 · 2 min · 422 words

2023  7

December  2

[Summary] Direct Preference Optimization (DPO)

December 23, 2023 · 2 min · 236 words

[Concept] Reinforcement learning from human feedback (RLHF)

December 9, 2023 · 2 min · 350 words

November  1

[Proof-of-Concept] DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

November 18, 2023 · 3 min · 437 words

October  2

[Summary] CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

October 27, 2023 · 2 min · 361 words

[Summary] Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

October 14, 2023 · 1 min · 206 words

July  1

[Summary] Break-A-Scene: Extracting Multiple Concepts from a Single Image

July 21, 2023 · 2 min · 340 words

May  1

[Summary] MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

May 19, 2023 · 1 min · 125 words