Hugging Face Daily Papers · · 4 min read

A Verifiable Search Is Not a Learnable Chain-of-Thought

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

A research study built on the NVIDIA Nemotron Model Reasoning Challenge.</p>\n","updatedAt":"2026-06-23T15:29:02.267Z","author":{"_id":"6a39fecac79fc942bf49e28d","avatarUrl":"/avatars/86e5e0dbd4ab073457c98ef855ed5d22.svg","fullname":"Harsh Patel","name":"harshpatel2898","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7390094995498657},"editors":["harshpatel2898"],"editorAvatarUrls":["/avatars/86e5e0dbd4ab073457c98ef855ed5d22.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.21884","authors":[{"_id":"6a39fcf2fdcd3514343bb559","user":{"_id":"6a39fecac79fc942bf49e28d","avatarUrl":"/avatars/86e5e0dbd4ab073457c98ef855ed5d22.svg","isPro":false,"fullname":"Harsh Patel","user":"harshpatel2898","type":"user","name":"harshpatel2898"},"name":"Harsh Patel","status":"claimed_verified","statusLastChangedAt":"2026-06-23T13:56:37.287Z","hidden":false}],"publishedAt":"2026-06-20T00:00:00.000Z","submittedOnDailyAt":"2026-06-23T00:00:00.000Z","title":"A Verifiable Search Is Not a Learnable Chain-of-Thought","submittedOnDailyBy":{"_id":"6a39fecac79fc942bf49e28d","avatarUrl":"/avatars/86e5e0dbd4ab073457c98ef855ed5d22.svg","isPro":false,"fullname":"Harsh Patel","user":"harshpatel2898","type":"user","name":"harshpatel2898"},"summary":"It is tempting to assume any task solvable by a short program can be taught to a model as its chain-of-thought: write the steps out, fine-tune, and the model follows. This paper shows the assumption fails for an identifiable class of procedures. The testbed is nine reasoning tasks, each from a deterministic generator; public and hidden splits share generators, so held-out data proxies test accuracy. I reverse-engineer the generators into Python solvers, render them as chain-of-thought, and distill into a rank-<= 32 LoRA over a 30B (3.5B-active) Nemotron model. Forward-computable tasks install readily: lookup/arithmetic and an 8-bit boolean task transfer (>= 0.99 and 0.68). Cryptarithm does not: distilling its backtracking search holds at 0.01-0.07 across eleven chain-of-thought designs, RL from verifiable rewards, and self-training, even though a search solver answers 71% of instances. This is not a capability gap. The model does the arithmetic on 97-100% of lines and ranks the correct cipher in its top eight on 71%; it cannot carry the search forward as a left-to-right derivation. Fine-tuning learns the shape of a verifiable elimination step while its verdicts become unconditional templates, correct only 16-57% of the time (\"verdict-as-token\"). The ceiling holds across backbones from 3B to 671B and across fine-tuning and prompting; a controlled intervention isolates the cause: revealing the cipher key, which turns the derivation forward, lifts the same instances from 0.03 to 0.57. When a procedure's only solution is search over information-free structure, no faithful forward chain-of-thought exists to imitate. The task becomes learnable only by removing the search, precomputing its combinatorial core into a catalog and reducing the trace to recall plus verification; the 1st-place solution reaches Private LB 0.92 this way. What distills is memorization and verification, not search.","upvotes":3,"discussionId":"6a39fcf3fdcd3514343bb55a","projectPage":"https://nemotron.harshpatel.live","githubRepo":"https://github.com/harshpatel1692/search-not-learnable","githubRepoAddedBy":"user","ai_summary":"Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a fundamental limitation in learning search procedures through demonstration.","ai_keywords":["chain-of-thought","fine-tuning","distillation","LoRA","Nemotron","backtracking search","verifiable rewards","self-training","forward-computable","search procedure","memorization","verification"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6a39fecac79fc942bf49e28d","avatarUrl":"/avatars/86e5e0dbd4ab073457c98ef855ed5d22.svg","isPro":false,"fullname":"Harsh Patel","user":"harshpatel2898","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"697c8b15a7f796854ef333c4","avatarUrl":"/avatars/94de3a736fac914944f1b57609e3819a.svg","isPro":false,"fullname":"Joel Wang","user":"joelhenwang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.21884.md","query":{}}">
Papers
arxiv:2606.21884

A Verifiable Search Is Not a Learnable Chain-of-Thought

Published on Jun 20
· Submitted by
Harsh Patel
on Jun 23
Authors:

Abstract

Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a fundamental limitation in learning search procedures through demonstration.

It is tempting to assume any task solvable by a short program can be taught to a model as its chain-of-thought: write the steps out, fine-tune, and the model follows. This paper shows the assumption fails for an identifiable class of procedures. The testbed is nine reasoning tasks, each from a deterministic generator; public and hidden splits share generators, so held-out data proxies test accuracy. I reverse-engineer the generators into Python solvers, render them as chain-of-thought, and distill into a rank-<= 32 LoRA over a 30B (3.5B-active) Nemotron model. Forward-computable tasks install readily: lookup/arithmetic and an 8-bit boolean task transfer (>= 0.99 and 0.68). Cryptarithm does not: distilling its backtracking search holds at 0.01-0.07 across eleven chain-of-thought designs, RL from verifiable rewards, and self-training, even though a search solver answers 71% of instances. This is not a capability gap. The model does the arithmetic on 97-100% of lines and ranks the correct cipher in its top eight on 71%; it cannot carry the search forward as a left-to-right derivation. Fine-tuning learns the shape of a verifiable elimination step while its verdicts become unconditional templates, correct only 16-57% of the time ("verdict-as-token"). The ceiling holds across backbones from 3B to 671B and across fine-tuning and prompting; a controlled intervention isolates the cause: revealing the cipher key, which turns the derivation forward, lifts the same instances from 0.03 to 0.57. When a procedure's only solution is search over information-free structure, no faithful forward chain-of-thought exists to imitate. The task becomes learnable only by removing the search, precomputing its combinatorial core into a catalog and reducing the trace to recall plus verification; the 1st-place solution reaches Private LB 0.92 this way. What distills is memorization and verification, not search.

Community

Paper author Paper submitter about 10 hours ago

A research study built on the NVIDIA Nemotron Model Reasoning Challenge.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.21884
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.21884 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.21884 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.21884 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers