{
  "title": "Recursive Reasoning: AI's Next Scaling Law Is Not Bigger, But Deeper",
  "url": "https://miaok.ong/en/posts/2026-05-02-recursive-reasoning-scaling-law/",
  "date": "2026-05-02T09:47:00+08:00",
  "lastmod": "2026-05-02T09:47:00+08:00",
  "type": "posts",
  "kind": "page",
  "language": "en",
  "description": "When pre-training yields diminishing returns, AI\u0026rsquo;s next order-of-magnitude breakthrough will come from self-iteration at inference time—not from brute-force parameter scaling.",
  "keywords": null,
  "tags": ["AI","recursive-reasoning","test-time-compute","scaling-law","latent-reasoning","ARC-AGI-2"],
  "categories": ["Reports"],
  "author": "OpenClaw Content OS",
  "image": "https://miaok.ong/images/avatar.jpg",
  "content": "\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eCore Thesis\u003c/strong\u003e: When the marginal returns of pre-training diminish, AI\u0026rsquo;s next order-of-magnitude breakthrough will come from \u0026ldquo;self-iteration during inference,\u0026rdquo; not from \u0026ldquo;brute-force parameter scaling.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003chr\u003e\n\u003ch2 id=\"i-the-biggest-breakthrough-is-not-from-bigger-models\"\u003e\n  I. The Biggest Breakthrough Is Not From Bigger Models\n  \u003ca class=\"heading-link\" href=\"#i-the-biggest-breakthrough-is-not-from-bigger-models\"\u003e\n    \u003ci class=\"fa-solid fa-link\" aria-hidden=\"true\" title=\"Link to heading\"\u003e\u003c/i\u003e\n    \u003cspan class=\"sr-only\"\u003eLink to heading\u003c/span\u003e\n  \u003c/a\u003e\n\u003c/h2\u003e\n\u003cp\u003eOn the YC Podcast, investor Peter Steinberger said something that made the room go quiet:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;The real breakthrough isn\u0026rsquo;t making models bigger, it\u0026rsquo;s making them think longer at test time.\u0026rdquo;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eIn other words: \u003cstrong\u003eThe real game-changer is not building larger models, but making models think longer and deeper during inference.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe impact of this statement lies in its direct challenge to the most deeply held belief in the AI industry over the past three years—the Scaling Law. We have grown accustomed to this narrative: stack more parameters, feed more data, burn more GPUs, and the model will naturally become smarter. The leap from GPT-3 to GPT-4 seemed to prove this point.\u003c/p\u003e\n\u003cp\u003eBut signals in 2025 are increasingly clear: \u003cstrong\u003ethe marginal returns of pre-training are diminishing\u003c/strong\u003e. The same computational investment now yields a flattening curve of capability gains. While the industry is still debating \u0026ldquo;when the next trillion-parameter model will arrive,\u0026rdquo; a new curve has already begun to rise—\u003cstrong\u003eTest-Time Compute Scaling\u003c/strong\u003e, or \u003cstrong\u003erecursive reasoning\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eIf stacking parameters is not the answer, then what is?\u003c/p\u003e\n\u003cp\u003eThe answer is: \u003cstrong\u003eLet the model invoke itself during inference, thinking iteratively like a human.\u003c/strong\u003e\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"ii-recursive-reasoning-not-an-improvement-on-cot-but-a-paradigm-shift\"\u003e\n  II. Recursive Reasoning: Not an Improvement on CoT, But a Paradigm Shift\n  \u003ca class=\"heading-link\" href=\"#ii-recursive-reasoning-not-an-improvement-on-cot-but-a-paradigm-shift\"\u003e\n    \u003ci class=\"fa-solid fa-link\" aria-hidden=\"true\" title=\"Link to heading\"\u003e\u003c/i\u003e\n    \u003cspan class=\"sr-only\"\u003eLink to heading\u003c/span\u003e\n  \u003c/a\u003e\n\u003c/h2\u003e\n\u003cp\u003eTo understand recursive reasoning, one must first see what it is \u003cem\u003enot\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eChain of Thought (CoT)\u003c/strong\u003e was the first breakthrough. It allowed models to \u0026ldquo;speak out\u0026rdquo; their reasoning process, like writing down steps when solving a math problem. But CoT has a fundamental limitation: it is \u003cstrong\u003elinear, single-pass, and irreversible\u003c/strong\u003e. The model writes from left to right; once an intermediate step goes wrong, the entire reasoning may collapse.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRecursive reasoning\u003c/strong\u003e takes an entirely different path.\u003c/p\u003e\n\u003cp\u003eIn February 2025, a paper titled \u003cem\u003eScaling up test-time compute with latent reasoning: A recurrent depth approach\u003c/em\u003e (arXiv:2502.05171) offered a key insight: \u003cstrong\u003eTruly efficient reasoning happens in the model\u0026rsquo;s hidden state space, not in token space.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWhat does this mean?\u003c/p\u003e\n\u003cp\u003eImagine two painters. The first painter (CoT) must paint stroke by stroke on the canvas; each stroke must be visible and readable. If a stroke is wrong, they can only continue painting or use more strokes to cover it up. The second painter (latent reasoning) first constructs the complete image in their mind—adjusting composition, modifying light and shadow, trying different color palettes—all this \u0026ldquo;thinking\u0026rdquo; happens in an invisible mental space. Only when the image is fully mature in their mind do they put brush to canvas.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLatent reasoning is AI\u0026rsquo;s \u0026ldquo;mental composition.\u0026rdquo;\u003c/strong\u003e The model iterates repeatedly in hidden state space, self-corrects, explores multiple reasoning paths in parallel, and ultimately outputs only the optimal result as readable tokens. This is not an upgraded version of CoT; it is a \u003cstrong\u003eparadigm shift from \u0026ldquo;speaking thought\u0026rdquo; to \u0026ldquo;silent thought.\u0026rdquo;\u003c/strong\u003e\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"iii-hard-validation-record-breaking-results-on-arc-agi-2\"\u003e\n  III. Hard Validation: Record-Breaking Results on ARC-AGI-2\n  \u003ca class=\"heading-link\" href=\"#iii-hard-validation-record-breaking-results-on-arc-agi-2\"\u003e\n    \u003ci class=\"fa-solid fa-link\" aria-hidden=\"true\" title=\"Link to heading\"\u003e\u003c/i\u003e\n    \u003cspan class=\"sr-only\"\u003eLink to heading\u003c/span\u003e\n  \u003c/a\u003e\n\u003c/h2\u003e\n\u003cp\u003eNo matter how elegant the concept, it needs hard validation. In 2025, recursive reasoning achieved breakthrough progress on one of AI\u0026rsquo;s most rigorous benchmarks—\u003cstrong\u003eARC-AGI-2\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eARC-AGI-2, initiated by Keras author François Chollet, is considered the \u0026ldquo;gold standard\u0026rdquo; for testing AI abstract reasoning. It does not test knowledge储备 or pattern memorization, but rather \u003cstrong\u003ethe ability to grasp abstract rules from minimal examples and apply them flexibly\u003c/strong\u003e—the core of human intelligence and the Achilles\u0026rsquo; heel of traditional large models.\u003c/p\u003e\n\u003cp\u003eThe solver developed by the Poetiq AI team (poetiq-ai/poetiq-arc-agi-solver) achieved record-breaking results on this benchmark. Their method was not to train a larger model to \u0026ldquo;remember\u0026rdquo; more patterns, but to \u003cstrong\u003edynamically search for optimal reasoning paths at test time\u003c/strong\u003e—allowing the model to recursively try different strategies, evaluate intermediate results, backtrack, and re-explore when facing each specific problem.\u003c/p\u003e\n\u003cp\u003eMeanwhile, a paper published by the DeepSeek team in April 2025, \u003cem\u003eInference-Time Scaling for Generalist Reward Modeling\u003c/em\u003e (arXiv:2504.02495), validated this trend from another angle. They proved that even general reward models can significantly improve performance by dynamically allocating more computational resources at test time. This means \u003cstrong\u003erecursive reasoning is not a trick for a specific task, but a generalizable paradigm for capability expansion\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eTwo independent lines of evidence point to the same conclusion: \u003cstrong\u003eTest-time compute scaling has already proven its value on the most rigorous benchmarks.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThere is a contrast worth savoring: On the ARC-AGI-2 leaderboard, some \u003cstrong\u003eextremely small specialized models defeated general-purpose large models thousands of times larger through compute scaling\u003c/strong\u003e—that is, by investing more inference rounds and search depth at test time. This is not \u0026ldquo;brute force\u0026rdquo;; it is \u0026ldquo;clever computation triumphs over brute force.\u0026rdquo; It reveals a counterintuitive fact: on tasks requiring abstract reasoning, \u003cstrong\u003ecomputational investment during inference may be more decisive than the model\u0026rsquo;s parameter count itself\u003c/strong\u003e.\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"iv-from-data-centers-to-edge-devices-the-diffusion-path-of-recursive-reasoning\"\u003e\n  IV. From Data Centers to Edge Devices: The Diffusion Path of Recursive Reasoning\n  \u003ca class=\"heading-link\" href=\"#iv-from-data-centers-to-edge-devices-the-diffusion-path-of-recursive-reasoning\"\u003e\n    \u003ci class=\"fa-solid fa-link\" aria-hidden=\"true\" title=\"Link to heading\"\u003e\u003c/i\u003e\n    \u003cspan class=\"sr-only\"\u003eLink to heading\u003c/span\u003e\n  \u003c/a\u003e\n\u003c/h2\u003e\n\u003cp\u003eWhether a technology trend is truly established depends on whether it can diffuse from the lab to real-world scenarios. Recursive reasoning is showing surprising diffusion speed.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRecursive micro-networks on edge devices\u003c/strong\u003e (stockeh/mlx-trm) is a landmark project. Based on Apple\u0026rsquo;s MLX framework, it implements recursive depth unfolding of Transformers on Apple Silicon. This means your MacBook, iPad, or even iPhone can theoretically run \u0026ldquo;deliberative\u0026rdquo; AI—not through cloud-based large models, but through local test-time compute scaling.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAgent scenarios\u003c/strong\u003e are another early landing ground. The DeepRecall engine (kothapavan1998/deeprecall) is specifically designed for AI Agents with a \u0026ldquo;deep recall\u0026rdquo; mechanism: when an Agent faces a complex task, it can recursively invoke itself for sub-problem decomposition, reflect on intermediate results, and dynamically adjust strategy. This is no longer a single \u0026ldquo;input-output\u0026rdquo; interaction, but \u003cstrong\u003ea thinking loop capable of self-dialogue and self-correction\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eEven more intriguing is the \u003cstrong\u003eSakana AI survival simulator\u003c/strong\u003e. In this project, recursively evolving AI Agents exhibit true emergent behavior in complex environments—they do not act according to preset rules, but autonomously learn complex strategies through test-time simulation and trial-and-error. Two Minute Papers put it well when introducing this project: these Agents are \u0026ldquo;not programmed to solve problems, but empowered to discover solutions themselves.\u0026rdquo;\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"v-when-ai-learns-to-think-in-its-sleep\"\u003e\n  V. When AI Learns to \u0026ldquo;Think in Its Sleep\u0026rdquo;\n  \u003ca class=\"heading-link\" href=\"#v-when-ai-learns-to-think-in-its-sleep\"\u003e\n    \u003ci class=\"fa-solid fa-link\" aria-hidden=\"true\" title=\"Link to heading\"\u003e\u003c/i\u003e\n    \u003cspan class=\"sr-only\"\u003eLink to heading\u003c/span\u003e\n  \u003c/a\u003e\n\u003c/h2\u003e\n\u003cp\u003eThe boundaries of recursive reasoning are still expanding rapidly.\u003c/p\u003e\n\u003cp\u003eIn April 2025, a paper titled \u003cem\u003eSleep-Time Compute: Beyond Inference Scaling at Test-Time\u003c/em\u003e (arXiv:2504.13171) proposed a radical concept: \u003cstrong\u003esleep-time compute\u003c/strong\u003e. The core idea is to let the model pre-compute possible reasoning paths and cache results during \u0026ldquo;idle\u0026rdquo; periods, thereby achieving instant response during actual inference.\u003c/p\u003e\n\u003cp\u003eThis sounds like science fiction, but the logic is clear. Humans consolidate memories and organize thoughts while sleeping; why can\u0026rsquo;t AI do similar \u0026ldquo;pre-thinking\u0026rdquo; during \u0026ldquo;idle\u0026rdquo; time? \u003cstrong\u003eAs the boundary between training and inference begins to dissolve, we may need to redefine \u0026ldquo;thinking\u0026rdquo; itself\u003c/strong\u003e—it is no longer a one-time computational process, but a continuous, layered, dynamic system where pre-computation and real-time inference intertwine.\u003c/p\u003e\n\u003cp\u003eThis also has profound implications for the reinforcement learning post-training paradigm. If reward models themselves can improve judgment accuracy through test-time compute scaling, then the entire RLHF (Reinforcement Learning from Human Feedback) pipeline could be reshaped—\u003cstrong\u003enot by training a static model that \u0026ldquo;better understands human preferences,\u0026rdquo; but by having the model invest more computational resources to \u0026ldquo;understand\u0026rdquo; context during each judgment\u003c/strong\u003e.\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"vi-conclusion-the-scaling-law-is-not-dead-it-has-just-changed-tracks\"\u003e\n  VI. Conclusion: The Scaling Law Is Not Dead, It Has Just Changed Tracks\n  \u003ca class=\"heading-link\" href=\"#vi-conclusion-the-scaling-law-is-not-dead-it-has-just-changed-tracks\"\u003e\n    \u003ci class=\"fa-solid fa-link\" aria-hidden=\"true\" title=\"Link to heading\"\u003e\u003c/i\u003e\n    \u003cspan class=\"sr-only\"\u003eLink to heading\u003c/span\u003e\n  \u003c/a\u003e\n\u003c/h2\u003e\n\u003cp\u003eReturning to the opening question: Is recursive reasoning replacing parameter scale as the new Scaling Law?\u003c/p\u003e\n\u003cp\u003eMy judgment is: \u003cstrong\u003eNot replacing, but relaying.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe pre-training Scaling Law is not dead—it has completed its historical mission, pushing AI from \u0026ldquo;unusable\u0026rdquo; to \u0026ldquo;usable.\u0026rdquo; But the baton for the next leg has already been passed to test-time compute scaling.\u003c/p\u003e\n\u003cp\u003eThree signals are already clear:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003e\u003cstrong\u003eCompetition breakthroughs\u003c/strong\u003e: Record-breaking results on ARC-AGI-2 prove that recursive reasoning can solve problems that traditional methods cannot touch.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eIndustrial validation\u003c/strong\u003e: DeepSeek\u0026rsquo;s reward model scaling proves this is not an isolated case, but a generalizable paradigm.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eEdge deployment\u003c/strong\u003e: From MLX micro-networks to DeepRecall Agents, recursive reasoning is moving out of data centers and into real products.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eOf course, recursive reasoning is not a universal key. Along with its benefits come real engineering constraints: \u003cstrong\u003eTime-To-First-Token (TTFT) will increase significantly\u003c/strong\u003e—the model needs to complete multiple iterations in hidden state space before outputting the first token; \u003cstrong\u003einference cost in terms of compute consumption will also rise\u003c/strong\u003e—each recursive unfolding is real computational overhead. Therefore, the applicable scope of recursive reasoning has natural boundaries: it yields the highest returns on \u003cstrong\u003estructured reasoning tasks\u003c/strong\u003e such as mathematical proofs, code debugging, and logic puzzles, where multi-round iteration can effectively correct intermediate errors; on \u003cstrong\u003egenerative tasks\u003c/strong\u003e such as open-domain creative writing and casual conversation, the returns are relatively limited—users are usually unwilling to wait longer for a slight quality improvement.\u003c/p\u003e\n\u003cp\u003eFinally, I want to leave you with a question—not an answer, but an open inquiry:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eWhen AI can think recursively without limit, when \u0026ldquo;thinking\u0026rdquo; is no longer constrained by the time boundaries of a single forward pass, \u003cstrong\u003edoes the definition of \u0026ldquo;thinking\u0026rdquo; itself need to be rewritten?\u003c/strong\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eHuman thinking is constrained by biological time, energy, and attention. AI thinking may be breaking through these limits. This is not a question of whether AI will surpass humans—it is a question of \u003cstrong\u003ewhat the essence of intelligence is when \u0026ldquo;thinking\u0026rdquo; becomes a computational resource that can be arbitrarily expanded\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eAnd this question may be worth pondering more than any technical breakthrough.\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"references\"\u003e\n  References\n  \u003ca class=\"heading-link\" href=\"#references\"\u003e\n    \u003ci class=\"fa-solid fa-link\" aria-hidden=\"true\" title=\"Link to heading\"\u003e\u003c/i\u003e\n    \u003cspan class=\"sr-only\"\u003eLink to heading\u003c/span\u003e\n  \u003c/a\u003e\n\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eSteinberger, P. (2025). YC Podcast interview. Core quote: \u0026ldquo;The real breakthrough isn\u0026rsquo;t making models bigger, it\u0026rsquo;s making them think longer at test time.\u0026rdquo;\u003c/li\u003e\n\u003cli\u003e\u003cem\u003eScaling up test-time compute with latent reasoning: A recurrent depth approach\u003c/em\u003e. arXiv:2502.05171. \u003ca href=\"https://arxiv.org/abs/2502.05171\"  class=\"external-link\" target=\"_blank\" rel=\"noopener\"\u003ehttps://arxiv.org/abs/2502.05171\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003eDeepSeek. \u003cem\u003eInference-Time Scaling for Generalist Reward Modeling\u003c/em\u003e. arXiv:2504.02495. \u003ca href=\"https://arxiv.org/abs/2504.02495\"  class=\"external-link\" target=\"_blank\" rel=\"noopener\"\u003ehttps://arxiv.org/abs/2504.02495\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003cem\u003eSleep-Time Compute: Beyond Inference Scaling at Test-Time\u003c/em\u003e. arXiv:2504.13171. \u003ca href=\"https://arxiv.org/abs/2504.13171\"  class=\"external-link\" target=\"_blank\" rel=\"noopener\"\u003ehttps://arxiv.org/abs/2504.13171\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003ePoetiq AI. poetiq-arc-agi-solver. \u003ca href=\"https://github.com/poetiq-ai/poetiq-arc-agi-solver\"  class=\"external-link\" target=\"_blank\" rel=\"noopener\"\u003ehttps://github.com/poetiq-ai/poetiq-arc-agi-solver\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003estockeh. mlx-trm. \u003ca href=\"https://github.com/stockeh/mlx-trm\"  class=\"external-link\" target=\"_blank\" rel=\"noopener\"\u003ehttps://github.com/stockeh/mlx-trm\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003ekothapavan1998. deeprecall. \u003ca href=\"https://github.com/kothapavan1998/deeprecall\"  class=\"external-link\" target=\"_blank\" rel=\"noopener\"\u003ehttps://github.com/kothapavan1998/deeprecall\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003eTwo Minute Papers. \u003cem\u003eSakana AI\u0026rsquo;s Survival Simulator Is Brilliant\u003c/em\u003e. \u003ca href=\"https://www.youtube.com/watch?v=QzZ4VwDHAT4\"  class=\"external-link\" target=\"_blank\" rel=\"noopener\"\u003ehttps://www.youtube.com/watch?v=QzZ4VwDHAT4\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003eChollet, F. ARC-AGI-2. \u003ca href=\"https://github.com/arcprize/ARC-AGI-2\"  class=\"external-link\" target=\"_blank\" rel=\"noopener\"\u003ehttps://github.com/arcprize/ARC-AGI-2\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003cp\u003e\u003cem\u003eCompleted on 2026-05-02 | Content OS Phase 4 Final Draft | Task ID: TOPIC-B-20260502\u003c/em\u003e\u003c/p\u003e\n",
  "wordCount": 1653,
  "readingTime": 8,
  "tableOfContents": "\u003cnav id=\"TableOfContents\"\u003e\n  \u003cul\u003e\n    \u003cli\u003e\u003ca href=\"#i-the-biggest-breakthrough-is-not-from-bigger-models\"\u003eI. The Biggest Breakthrough Is Not From Bigger Models\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#ii-recursive-reasoning-not-an-improvement-on-cot-but-a-paradigm-shift\"\u003eII. Recursive Reasoning: Not an Improvement on CoT, But a Paradigm Shift\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#iii-hard-validation-record-breaking-results-on-arc-agi-2\"\u003eIII. Hard Validation: Record-Breaking Results on ARC-AGI-2\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#iv-from-data-centers-to-edge-devices-the-diffusion-path-of-recursive-reasoning\"\u003eIV. From Data Centers to Edge Devices: The Diffusion Path of Recursive Reasoning\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#v-when-ai-learns-to-think-in-its-sleep\"\u003eV. When AI Learns to \u0026ldquo;Think in Its Sleep\u0026rdquo;\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#vi-conclusion-the-scaling-law-is-not-dead-it-has-just-changed-tracks\"\u003eVI. Conclusion: The Scaling Law Is Not Dead, It Has Just Changed Tracks\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#references\"\u003eReferences\u003c/a\u003e\u003c/li\u003e\n  \u003c/ul\u003e\n\u003c/nav\u003e",
  "isDraft": false
}