Hugging Face Daily Papers · · 5 min read

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/0whXCio2sdCw94a-EUbjx.png\" width=\"85%\" alt=\"GateMem\">\n</p>\n\nGateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.\n\n<p align=\"center\">\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/XZgiRXYpu4MZiQfThVq1h.png\" width=\"95%\" alt=\"GateMem pl\">\n</p>\n\nWe release:\n* 91 long-form multi-party episodes\n* 2,218 hidden evaluation checkpoints\n* 4 shared-memory domains: medical, office, education, and household\n* 7 memory-agent baselines across 6 backbone LLMs\n* official evaluation code\n* a public leaderboard and online submission interface\n\nResources:\n* 📄 Paper: https://arxiv.org/abs/2606.18829\n* 🌐 Project page: https://rzhub.github.io/GateMem/project.html\n* 💻 Code: https://github.com/rzhub/GateMem\n* 🤗 Dataset: https://huggingface.co/datasets/Ray368/GateMem\n* 🏆 Leaderboard: https://rzhub.github.io/GateMem/\n* 🚀 Submit results: https://huggingface.co/spaces/Ray368/GateMem-Submit\n\nWe welcome new submissions to the leaderboard and feedback from the community.","html":"<p>Thanks for checking out GateMem!</p>\n<p align=\"center\">\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/0whXCio2sdCw94a-EUbjx.png\" width=\"85%\" alt=\"GateMem\">\n</p>\n\n<p>GateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.</p>\n<p align=\"center\">\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/XZgiRXYpu4MZiQfThVq1h.png\" width=\"95%\" alt=\"GateMem pl\">\n</p>\n\n<p>We release:</p>\n<ul>\n<li>91 long-form multi-party episodes</li>\n<li>2,218 hidden evaluation checkpoints</li>\n<li>4 shared-memory domains: medical, office, education, and household</li>\n<li>7 memory-agent baselines across 6 backbone LLMs</li>\n<li>official evaluation code</li>\n<li>a public leaderboard and online submission interface</li>\n</ul>\n<p>Resources:</p>\n<ul>\n<li>📄 Paper: <a href=\"https://arxiv.org/abs/2606.18829\" rel=\"nofollow\">https://arxiv.org/abs/2606.18829</a></li>\n<li>🌐 Project page: <a href=\"https://rzhub.github.io/GateMem/project.html\" rel=\"nofollow\">https://rzhub.github.io/GateMem/project.html</a></li>\n<li>💻 Code: <a href=\"https://github.com/rzhub/GateMem\" rel=\"nofollow\">https://github.com/rzhub/GateMem</a></li>\n<li>🤗 Dataset: <a href=\"https://huggingface.co/datasets/Ray368/GateMem\">https://huggingface.co/datasets/Ray368/GateMem</a></li>\n<li>🏆 Leaderboard: <a href=\"https://rzhub.github.io/GateMem/\" rel=\"nofollow\">https://rzhub.github.io/GateMem/</a></li>\n<li>🚀 Submit results: <a href=\"https://huggingface.co/spaces/Ray368/GateMem-Submit\">https://huggingface.co/spaces/Ray368/GateMem-Submit</a></li>\n</ul>\n<p>We welcome new submissions to the leaderboard and feedback from the community.</p>\n","updatedAt":"2026-06-22T05:11:54.321Z","author":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","fullname":"Zhe Ren","name":"Ray368","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":6,"identifiedLanguage":{"language":"en","probability":0.729583203792572},"editors":["Ray368"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.18829","authors":[{"_id":"6a33e7b7fc3a8b1102d94397","user":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user","name":"Ray368"},"name":"Zhe Ren","status":"claimed_verified","statusLastChangedAt":"2026-06-19T14:20:52.358Z","hidden":false},{"_id":"6a33e7b7fc3a8b1102d94398","name":"Yibo Yang","hidden":false},{"_id":"6a33e7b7fc3a8b1102d94399","name":"Yimeng Chen","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439a","name":"Zijun Zhao","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439b","name":"Benshuo Fu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439c","name":"Zhihao Shu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439d","name":"Bingjie Zhang","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439e","name":"Yangyang Xu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439f","name":"Dandan Guo","hidden":false},{"_id":"6a33e7b7fc3a8b1102d943a0","name":"Shuicheng Yan","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/Pc05fy1hRuYm06ceFfNT2.jpeg","https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/FHuKmwoFQxzdYNDha_0Dv.jpeg"],"publishedAt":"2026-06-17T00:00:00.000Z","submittedOnDailyAt":"2026-06-22T00:00:00.000Z","title":"GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents","submittedOnDailyBy":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user","name":"Ray368"},"summary":"Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.","upvotes":4,"discussionId":"6a33e7b8fc3a8b1102d943a1","projectPage":"https://rzhub.github.io/GateMem/project.html","githubRepo":"https://github.com/rzhub/GateMem","githubRepoAddedBy":"user","ai_summary":"Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts.","ai_keywords":["memory benchmarks","multi-principal shared-memory agents","access control","active forgetting","long-context prompting","retrieval-based methods","external-memory methods"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":63},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"680c615cb1024379fac47b8c","avatarUrl":"/avatars/1480a62811146de955a782da9e9c8212.svg","isPro":false,"fullname":"Li tonghui","user":"litonghui","type":"user"},{"_id":"69ddb3390f67b65cbfefca87","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/hBYjOj2a_RK2yeWodleSV.jpeg","isPro":false,"fullname":"付本硕","user":"fubenshuo","type":"user"},{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user"},{"_id":"6a38cad617951799b4a363c9","avatarUrl":"/avatars/4b9f92fe5c28c3c30143cf73ccbec18c.svg","isPro":false,"fullname":"sun fu","user":"sunyanzhu","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":3,"query":{}}">
Papers
arxiv:2606.18829

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Published on Jun 17
· Submitted by
Zhe Ren
on Jun 22
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,

Abstract

Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts.

Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.

Community

Paper author Paper submitter about 4 hours ago
edited about 3 hours ago

Thanks for checking out GateMem!

GateMem

GateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.

GateMem pl

We release:

  • 91 long-form multi-party episodes
  • 2,218 hidden evaluation checkpoints
  • 4 shared-memory domains: medical, office, education, and household
  • 7 memory-agent baselines across 6 backbone LLMs
  • official evaluation code
  • a public leaderboard and online submission interface

Resources:

We welcome new submissions to the leaderboard and feedback from the community.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.18829 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers