<strong>TL;DR — A machine-checked consistency hierarchy for multi-agent LLM runtimes, with two reproduced in-the-wild bugs and a deployed Rust runtime.</strong></p>\n<p>Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. This paper models that sharing as long-running read–generate–write operations and formalizes four concurrency anomalies in TLA+ — stale-generation, phantom-tool, causal-cascade, and tool-effect reordering — as structural analogues of classical database-isolation anomalies, each with a TLC counter-example.</p>\n<p>What is actually verified and built:</p>\n<ul>\n<li><strong>A strictly-separated chain L0 ⊊ L1 ⊊ L2 ⊊ L3 ⊊ L4</strong> over those anomalies. 274 Verus obligations (0 <code>assume</code>, 0 <code>admit</code>; trust base = two structural axioms + a mutex correspondence) prove the detectors sound <em>and</em> complete and each runtime its avoidance set, plus a TLAPS-checked chain coherence (15 obligations) and an A1 generation-phase lower bound (28 obligations).</li>\n<li><strong>Three deployed Rust runtimes</strong> at L0–L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation. L2 is run <strong>live across three model families</strong> — A3 prevented in all 120 supervisor-retracted sessions (0/120).</li>\n<li><strong>Two reproduced in-the-wild findings:</strong> a silent lost update in ByteDance's <code>deer-flow</code> (formalized as a verified L0→L1 fix), and tool-effect reordering in LangGraph's <code>ToolNode</code> on unmodified output (removed by an L3 commit-order sequencer).</li>\n<li><strong>Prevention costs are bounded, not catastrophic:</strong> SSI within sampling noise on one workload; pessimistic locking 1.6–2.3× on the contended cell.</li>\n</ul>\n<p>Scope, stated plainly: the anomalies and the lattice are classical. The contribution is the <em>mechanized realizability and strict separation</em>, the executable runtimes, and a fully reproducible artifact (Verus + TLA+ + GenMC, the Rust runtime, and the Python harnesses). The supplementary appendix (§§ A–F, Tables S1–S6) is attached as an arXiv ancillary file.</p>\n<p>Paper: <a href=\"https://arxiv.org/abs/2606.17182\" rel=\"nofollow\">https://arxiv.org/abs/2606.17182</a><br>Code & specs: <a href=\"https://github.com/sajjadanwar0/mac-consistency\" rel=\"nofollow\">https://github.com/sajjadanwar0/mac-consistency</a></p>\n","updatedAt":"2026-06-17T16:28:44.295Z","author":{"_id":"648622c80e6657a2a9400677","avatarUrl":"/avatars/0b56f575f0556366b34f0157a6a613bf.svg","fullname":"Sajjad Khan","name":"sajjadanwar0","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8416483402252197},"editors":["sajjadanwar0"],"editorAvatarUrls":["/avatars/0b56f575f0556366b34f0157a6a613bf.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.17182","authors":[{"_id":"6a32c5ff59127a45e2c1c418","name":"Sajjad Khan","hidden":false}],"publishedAt":"2026-06-15T00:00:00.000Z","submittedOnDailyAt":"2026-06-17T00:00:00.000Z","title":"Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems","submittedOnDailyBy":{"_id":"648622c80e6657a2a9400677","avatarUrl":"/avatars/0b56f575f0556366b34f0157a6a613bf.svg","isPro":false,"fullname":"Sajjad Khan","user":"sajjadanwar0","type":"user","name":"sajjadanwar0"},"summary":"Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, L_0 subsetneq cdots subsetneq L_4, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified L_0 to L_1 refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.","upvotes":0,"discussionId":"6a32c5ff59127a45e2c1c419","githubRepo":"https://github.com/sajjadanwar0/mac-consistency","githubRepoAddedBy":"user","ai_summary":"Multi-agent LLM systems with shared state are analyzed through formal methods identifying concurrency anomalies and establishing a verified consistency hierarchy with mechanized proofs of soundness and completeness.","ai_keywords":["multi-agent LLM systems","memory stores","vector indices","tool registries","deterministic-generation semantics","durable-execution engines","TLA+","concurrency anomalies","stale-generation","phantom-tool","causal-cascade","tool-effect reordering","exclusion lattice","Verus obligations","mutex correspondence","serializable snapshot isolation","dependency-free prevention twins","refinement","state machine"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.17182.md","query":{}}">
Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems
Abstract
Multi-agent LLM systems with shared state are analyzed through formal methods identifying concurrency anomalies and establishing a verified consistency hierarchy with mechanized proofs of soundness and completeness.
Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, L_0 subsetneq cdots subsetneq L_4, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified L_0 to L_1 refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.
Community
TL;DR — A machine-checked consistency hierarchy for multi-agent LLM runtimes, with two reproduced in-the-wild bugs and a deployed Rust runtime.
Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. This paper models that sharing as long-running read–generate–write operations and formalizes four concurrency anomalies in TLA+ — stale-generation, phantom-tool, causal-cascade, and tool-effect reordering — as structural analogues of classical database-isolation anomalies, each with a TLC counter-example.
What is actually verified and built:
- A strictly-separated chain L0 ⊊ L1 ⊊ L2 ⊊ L3 ⊊ L4 over those anomalies. 274 Verus obligations (0
assume, 0 admit; trust base = two structural axioms + a mutex correspondence) prove the detectors sound and complete and each runtime its avoidance set, plus a TLAPS-checked chain coherence (15 obligations) and an A1 generation-phase lower bound (28 obligations).
- Three deployed Rust runtimes at L0–L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation. L2 is run live across three model families — A3 prevented in all 120 supervisor-retracted sessions (0/120).
- Two reproduced in-the-wild findings: a silent lost update in ByteDance's
deer-flow (formalized as a verified L0→L1 fix), and tool-effect reordering in LangGraph's ToolNode on unmodified output (removed by an L3 commit-order sequencer).
- Prevention costs are bounded, not catastrophic: SSI within sampling noise on one workload; pessimistic locking 1.6–2.3× on the contended cell.
Scope, stated plainly: the anomalies and the lattice are classical. The contribution is the mechanized realizability and strict separation, the executable runtimes, and a fully reproducible artifact (Verus + TLA+ + GenMC, the Rust runtime, and the Python harnesses). The supplementary appendix (§§ A–F, Tables S1–S6) is attached as an arXiv ancillary file.
Paper: https://arxiv.org/abs/2606.17182
Code & specs: https://github.com/sajjadanwar0/mac-consistency
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.17182 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.17182 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.17182 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.