Hugging Face Daily Papers · June 12, 2026 · 5 min read

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Coding agents remember your corrections but still violate them — with Mem0 memory, 57.5% of preference checks still fail. The problem isn't forgetting; it's that memory is advisory, not binding.\nTRACE fixes this by changing the representation: it mines your own chat corrections, rewrites each as an atomic rule, and compiles it into a runtime check that must pass before the agent can finish the task. A remembered preference becomes an execution constraint, not a suggestion the agent can quietly skip.\nThe payoff: held-out preference violations drop from 100% to 37.6% in-distribution without hurting task success or adding runtime cost, and with users having to repeat themselves far less often.\n💻Code: <a href=\"https://github.com/YujunZhou/TRACE_exp\" rel=\"nofollow\">https://github.com/YujunZhou/TRACE_exp</a> 🔧 Skill: <a href=\"https://github.com/YujunZhou/tellonce\" rel=\"nofollow\">https://github.com/YujunZhou/tellonce</a>\n","updatedAt":"2026-06-12T18:46:38.370Z","author":{"_id":"65a05abf07184d32fa002d41","avatarUrl":"/avatars/3a23e7e568d2024381ed31b56c1c461a.svg","fullname":"Yujun Zhou","name":"yujunzhou","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9229187369346619},"editors":["yujunzhou"],"editorAvatarUrls":["/avatars/3a23e7e568d2024381ed31b56c1c461a.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.13174","authors":[{"_id":"6a2c4a30a0d4daae4285efb6","name":"Yujun Zhou","hidden":false},{"_id":"6a2c4a30a0d4daae4285efb7","name":"Kehan Guo","hidden":false},{"_id":"6a2c4a30a0d4daae4285efb8","name":"Haomin Zhuang","hidden":false},{"_id":"6a2c4a30a0d4daae4285efb9","name":"Xiangqi Wang","hidden":false},{"_id":"6a2c4a30a0d4daae4285efba","name":"Yue Huang","hidden":false},{"_id":"6a2c4a30a0d4daae4285efbb","name":"Zhenwen Liang","hidden":false},{"_id":"6a2c4a30a0d4daae4285efbc","name":"Pin-Yu Chen","hidden":false},{"_id":"6a2c4a30a0d4daae4285efbd","name":"Tian Gao","hidden":false},{"_id":"6a2c4a30a0d4daae4285efbe","name":"Nuno Moniz","hidden":false},{"_id":"6a2c4a30a0d4daae4285efbf","name":"Nitesh V. Chawla","hidden":false},{"_id":"6a2c4a30a0d4daae4285efc0","name":"Xiangliang Zhang","hidden":false}],"publishedAt":"2026-06-11T00:00:00.000Z","submittedOnDailyAt":"2026-06-12T00:00:00.000Z","title":"Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents","submittedOnDailyBy":{"_id":"65a05abf07184d32fa002d41","avatarUrl":"/avatars/3a23e7e568d2024381ed31b56c1c461a.svg","isPro":false,"fullname":"Yujun Zhou","user":"yujunzhou","type":"user","name":"yujunzhou"},"summary":"Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction cases, Mem0 memory still leaves 57.5% of applicable preference checks violated. We introduce Test-time Rule Acquisition and Compiled Enforcement (TRACE), a drop-in skill-layer pipeline for coding-agent runtimes that mines user corrections, rewrites them as atomic rules, and compiles them into runtime checks that must pass before an agent completes future tasks. Unlike runtime checks written ahead of time by developers, TRACE skills come from the user's own chat corrections. We evaluate TRACE with simulated user-in-the-loop experiments on ClawArena coding-agent tasks and MemoryArena-derived memory-intensive tasks. On ClawArena, TRACE reduces held-out preference violation from 100.0% to 37.6% on in-distribution tasks and from 100.0% to 2.0% on out-of-distribution tasks. On MemoryArena-derived tasks, TRACE reduces in-distribution violation from 100.0% to 60.5% while matching or exceeding the strongest memory baseline on task pass. These results suggest that compiling corrections into runtime enforcement can address a repeated-friction failure mode that memory alone does not reliably solve, reducing the need for users to restate the same correction across future sessions. Experiment code is available at https://github.com/YujunZhou/TRACE_exp, and the deployable skill is available at https://github.com/YujunZhou/tellonce.","upvotes":1,"discussionId":"6a2c4a30a0d4daae4285efc1","githubRepo":"https://github.com/YujunZhou/tellonce","githubRepoAddedBy":"user","ai_summary":"TRACE is a skill-layer pipeline that mines user corrections to create runtime checks, significantly reducing preference violations in interactive LLM agents.","ai_keywords":["LLM agents","preference compliance","runtime checks","user corrections","rule acquisition","compiled enforcement","memory systems","task pass","in-distribution","out-of-distribution"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65a05abf07184d32fa002d41","avatarUrl":"/avatars/3a23e7e568d2024381ed31b56c1c461a.svg","isPro":false,"fullname":"Yujun Zhou","user":"yujunzhou","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.13174.md","query":{}}">

Papers

arxiv:2606.13174

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

Published on Jun 11

· Submitted by

Yujun Zhou on Jun 12

Upvote

Authors:

Abstract

TRACE is a skill-layer pipeline that mines user corrections to create runtime checks, significantly reducing preference violations in interactive LLM agents.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction cases, Mem0 memory still leaves 57.5% of applicable preference checks violated. We introduce Test-time Rule Acquisition and Compiled Enforcement (TRACE), a drop-in skill-layer pipeline for coding-agent runtimes that mines user corrections, rewrites them as atomic rules, and compiles them into runtime checks that must pass before an agent completes future tasks. Unlike runtime checks written ahead of time by developers, TRACE skills come from the user's own chat corrections. We evaluate TRACE with simulated user-in-the-loop experiments on ClawArena coding-agent tasks and MemoryArena-derived memory-intensive tasks. On ClawArena, TRACE reduces held-out preference violation from 100.0% to 37.6% on in-distribution tasks and from 100.0% to 2.0% on out-of-distribution tasks. On MemoryArena-derived tasks, TRACE reduces in-distribution violation from 100.0% to 60.5% while matching or exceeding the strongest memory baseline on task pass. These results suggest that compiling corrections into runtime enforcement can address a repeated-friction failure mode that memory alone does not reliably solve, reducing the need for users to restate the same correction across future sessions. Experiment code is available at https://github.com/YujunZhou/TRACE_exp, and the deployable skill is available at https://github.com/YujunZhou/tellonce.

View arXiv page View PDF GitHub 1 Add to collection

Community

yujunzhou

Paper submitter about 3 hours ago

•

edited about 2 hours ago

Coding agents remember your corrections but still violate them — with Mem0 memory, 57.5% of preference checks still fail. The problem isn't forgetting; it's that memory is advisory, not binding.

TRACE fixes this by changing the representation: it mines your own chat corrections, rewrites each as an atomic rule, and compiles it into a runtime check that must pass before the agent can finish the task. A remembered preference becomes an execution constraint, not a suggestion the agent can quietly skip.

The payoff: held-out preference violations drop from 100% to 37.6% in-distribution without hurting task success or adding runtime cost, and with users having to repeat themselves far less often.

💻Code: https://github.com/YujunZhou/TRACE_exp
🔧 Skill: https://github.com/YujunZhou/tellonce

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.13174

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.13174 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.13174 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.13174 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers