[Summary] From Reasoning to Super-Intelligence: A Search-Theoretic Perspective

TL;DR Popular methods for chain‑of‑thought (CoT) reasoning (e.g supervised fine‑tuning, Tree‑of‑Thoughts) have three challenges: (i) distribution drift where small mistakes spiral with no recovery mechanism, (ii) missing search structure such that there’s no built-in exploration or backtracking, (iii) and explosive computational cost. The proposed Diligent Learner models reasoning as depth-first search guided by a validator. It is trained by building reasoning paths step-by-step, checks each one for correctness, and backtracks when needed....

August 7, 2025 · 5 min · 875 words

[Summary] ReAct: Synergizing Reasoning and Acting in Language Models

TL;DR Large Language Models (LLMs) often suffer from hallucinations. Two common mitigation strategies are Chain of Thought (CoT), where the LLM is prompted to show its step-by-step reasoning, and Act, where LLMs use external tools to ground their answers in reliable databases. However, CoT relies on the model’s internal representations, limiting its ability to reason reactively or update its knowledge. ReAct is a prompting method that combines CoT with action plan generation using external tools....

January 17, 2025 · 1 min · 203 words