Reasoning Optimization

TL;DR Chain-of-Thought (CoT) enables large language models (LLMs) to solve complex tasks by generating intermediate reasoning steps. Ada-R1 approach fine-tunes a model to prefer Short-CoT over Long-CoT based on problem complexity, using training a model to minimize reasoning length while preserving accuracy. This approach reduces average reasoning length by over 50%, substantially lowering inference cost, with maintained accuracy across five mathematical reasoning benchmarks. Background CoT prompting decomposes complex tasks into intermediate reasoning steps....