\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/0whXCio2sdCw94a-EUbjx.png\" width=\"85%\" alt=\"GateMem\">\n</p>\n\nGateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.\n\n<p align=\"center\">\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/XZgiRXYpu4MZiQfThVq1h.png\" width=\"95%\" alt=\"GateMem pl\">\n</p>\n\nWe release:\n* 91 long-form multi-party episodes\n* 2,218 hidden evaluation checkpoints\n* 4 shared-memory domains: medical, office, education, and household\n* 7 memory-agent baselines across 6 backbone LLMs\n* official evaluation code\n* a public leaderboard and online submission interface\n\nResources:\n* 📄 Paper: https://arxiv.org/abs/2606.18829\n* 🌐 Project page: https://rzhub.github.io/GateMem/project.html\n* 💻 Code: https://github.com/rzhub/GateMem\n* 🤗 Dataset: https://huggingface.co/datasets/Ray368/GateMem\n* 🏆 Leaderboard: https://rzhub.github.io/GateMem/\n* 🚀 Submit results: https://huggingface.co/spaces/Ray368/GateMem-Submit\n\nWe welcome new submissions to the leaderboard and feedback from the community.","html":"<p>Thanks for checking out GateMem!</p>\n<p align=\"center\">\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/0whXCio2sdCw94a-EUbjx.png\" width=\"85%\" alt=\"GateMem\">\n</p>\n\n<p>GateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.</p>\n<p align=\"center\">\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/XZgiRXYpu4MZiQfThVq1h.png\" width=\"95%\" alt=\"GateMem pl\">\n</p>\n\n<p>We release:</p>\n<ul>\n<li>91 long-form multi-party episodes</li>\n<li>2,218 hidden evaluation checkpoints</li>\n<li>4 shared-memory domains: medical, office, education, and household</li>\n<li>7 memory-agent baselines across 6 backbone LLMs</li>\n<li>official evaluation code</li>\n<li>a public leaderboard and online submission interface</li>\n</ul>\n<p>Resources:</p>\n<ul>\n<li>📄 Paper: <a href=\"https://arxiv.org/abs/2606.18829\" rel=\"nofollow\">https://arxiv.org/abs/2606.18829</a></li>\n<li>🌐 Project page: <a href=\"https://rzhub.github.io/GateMem/project.html\" rel=\"nofollow\">https://rzhub.github.io/GateMem/project.html</a></li>\n<li>💻 Code: <a href=\"https://github.com/rzhub/GateMem\" rel=\"nofollow\">https://github.com/rzhub/GateMem</a></li>\n<li>🤗 Dataset: <a href=\"https://huggingface.co/datasets/Ray368/GateMem\">https://huggingface.co/datasets/Ray368/GateMem</a></li>\n<li>🏆 Leaderboard: <a href=\"https://rzhub.github.io/GateMem/\" rel=\"nofollow\">https://rzhub.github.io/GateMem/</a></li>\n<li>🚀 Submit results: <a href=\"https://huggingface.co/spaces/Ray368/GateMem-Submit\">https://huggingface.co/spaces/Ray368/GateMem-Submit</a></li>\n</ul>\n<p>We welcome new submissions to the leaderboard and feedback from the community.</p>\n","updatedAt":"2026-06-22T05:11:54.321Z","author":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","fullname":"Zhe Ren","name":"Ray368","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":6,"identifiedLanguage":{"language":"en","probability":0.729583203792572},"editors":["Ray368"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.18829","authors":[{"_id":"6a33e7b7fc3a8b1102d94397","user":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user","name":"Ray368"},"name":"Zhe Ren","status":"claimed_verified","statusLastChangedAt":"2026-06-19T14:20:52.358Z","hidden":false},{"_id":"6a33e7b7fc3a8b1102d94398","name":"Yibo Yang","hidden":false},{"_id":"6a33e7b7fc3a8b1102d94399","name":"Yimeng Chen","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439a","name":"Zijun Zhao","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439b","name":"Benshuo Fu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439c","name":"Zhihao Shu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439d","name":"Bingjie Zhang","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439e","name":"Yangyang Xu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439f","name":"Dandan Guo","hidden":false},{"_id":"6a33e7b7fc3a8b1102d943a0","name":"Shuicheng Yan","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/Pc05fy1hRuYm06ceFfNT2.jpeg","https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/FHuKmwoFQxzdYNDha_0Dv.jpeg"],"publishedAt":"2026-06-17T00:00:00.000Z","submittedOnDailyAt":"2026-06-22T00:00:00.000Z","title":"GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents","submittedOnDailyBy":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user","name":"Ray368"},"summary":"Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.","upvotes":4,"discussionId":"6a33e7b8fc3a8b1102d943a1","projectPage":"https://rzhub.github.io/GateMem/project.html","githubRepo":"https://github.com/rzhub/GateMem","githubRepoAddedBy":"user","ai_summary":"Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts.","ai_keywords":["memory benchmarks","multi-principal shared-memory agents","access control","active forgetting","long-context prompting","retrieval-based methods","external-memory methods"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":63},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"680c615cb1024379fac47b8c","avatarUrl":"/avatars/1480a62811146de955a782da9e9c8212.svg","isPro":false,"fullname":"Li tonghui","user":"litonghui","type":"user"},{"_id":"69ddb3390f67b65cbfefca87","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/hBYjOj2a_RK2yeWodleSV.jpeg","isPro":false,"fullname":"付本硕","user":"fubenshuo","type":"user"},{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user"},{"_id":"6a38cad617951799b4a363c9","avatarUrl":"/avatars/4b9f92fe5c28c3c30143cf73ccbec18c.svg","isPro":false,"fullname":"sun fu","user":"sunyanzhu","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":3,"query":{}}">
GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents
Abstract
Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts.
Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.
Community
Thanks for checking out GateMem!
GateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.
We release:
- 91 long-form multi-party episodes
- 2,218 hidden evaluation checkpoints
- 4 shared-memory domains: medical, office, education, and household
- 7 memory-agent baselines across 6 backbone LLMs
- official evaluation code
- a public leaderboard and online submission interface
Resources:
We welcome new submissions to the leaderboard and feedback from the community.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.18829 in a model README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.