Hugging Face Daily Papers · June 22, 2026 · 5 min read

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/0whXCio2sdCw94a-EUbjx.png\" width=\"85%\" alt=\"GateMem\">\n\n\nGateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.\n\n\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/XZgiRXYpu4MZiQfThVq1h.png\" width=\"95%\" alt=\"GateMem pl\">\n\n\nWe release:\n* 91 long-form multi-party episodes\n* 2,218 hidden evaluation checkpoints\n* 4 shared-memory domains: medical, office, education, and household\n* 7 memory-agent baselines across 6 backbone LLMs\n* official evaluation code\n* a public leaderboard and online submission interface\n\nResources:\n* 📄 Paper: https://arxiv.org/abs/2606.18829\n* 🌐 Project page: https://rzhub.github.io/GateMem/project.html\n* 💻 Code: https://github.com/rzhub/GateMem\n* 🤗 Dataset: https://huggingface.co/datasets/Ray368/GateMem\n* 🏆 Leaderboard: https://rzhub.github.io/GateMem/\n* 🚀 Submit results: https://huggingface.co/spaces/Ray368/GateMem-Submit\n\nWe welcome new submissions to the leaderboard and feedback from the community.","html":"Thanks for checking out GateMem!\n\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/0whXCio2sdCw94a-EUbjx.png\" width=\"85%\" alt=\"GateMem\">\n\n\nGateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.\n\n <img src=\"https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/XZgiRXYpu4MZiQfThVq1h.png\" width=\"95%\" alt=\"GateMem pl\">\n\n\nWe release:\n<ul>\n<li>91 long-form multi-party episodes</li>\n<li>2,218 hidden evaluation checkpoints</li>\n<li>4 shared-memory domains: medical, office, education, and household</li>\n<li>7 memory-agent baselines across 6 backbone LLMs</li>\n<li>official evaluation code</li>\n<li>a public leaderboard and online submission interface</li>\n</ul>\nResources:\n<ul>\n<li>📄 Paper: <a href=\"https://arxiv.org/abs/2606.18829\" rel=\"nofollow\">https://arxiv.org/abs/2606.18829</a></li>\n<li>🌐 Project page: <a href=\"https://rzhub.github.io/GateMem/project.html\" rel=\"nofollow\">https://rzhub.github.io/GateMem/project.html</a></li>\n<li>💻 Code: <a href=\"https://github.com/rzhub/GateMem\" rel=\"nofollow\">https://github.com/rzhub/GateMem</a></li>\n<li>🤗 Dataset: <a href=\"https://huggingface.co/datasets/Ray368/GateMem\">https://huggingface.co/datasets/Ray368/GateMem</a></li>\n<li>🏆 Leaderboard: <a href=\"https://rzhub.github.io/GateMem/\" rel=\"nofollow\">https://rzhub.github.io/GateMem/</a></li>\n<li>🚀 Submit results: <a href=\"https://huggingface.co/spaces/Ray368/GateMem-Submit\">https://huggingface.co/spaces/Ray368/GateMem-Submit</a></li>\n</ul>\nWe welcome new submissions to the leaderboard and feedback from the community.\n","updatedAt":"2026-06-22T05:11:54.321Z","author":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","fullname":"Zhe Ren","name":"Ray368","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":6,"identifiedLanguage":{"language":"en","probability":0.729583203792572},"editors":["Ray368"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.18829","authors":[{"_id":"6a33e7b7fc3a8b1102d94397","user":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user","name":"Ray368"},"name":"Zhe Ren","status":"claimed_verified","statusLastChangedAt":"2026-06-19T14:20:52.358Z","hidden":false},{"_id":"6a33e7b7fc3a8b1102d94398","name":"Yibo Yang","hidden":false},{"_id":"6a33e7b7fc3a8b1102d94399","name":"Yimeng Chen","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439a","name":"Zijun Zhao","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439b","name":"Benshuo Fu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439c","name":"Zhihao Shu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439d","name":"Bingjie Zhang","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439e","name":"Yangyang Xu","hidden":false},{"_id":"6a33e7b7fc3a8b1102d9439f","name":"Dandan Guo","hidden":false},{"_id":"6a33e7b7fc3a8b1102d943a0","name":"Shuicheng Yan","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/Pc05fy1hRuYm06ceFfNT2.jpeg","https://cdn-uploads.huggingface.co/production/uploads/662df118b0d404635d7a2b46/FHuKmwoFQxzdYNDha_0Dv.jpeg"],"publishedAt":"2026-06-17T00:00:00.000Z","submittedOnDailyAt":"2026-06-22T00:00:00.000Z","title":"GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents","submittedOnDailyBy":{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user","name":"Ray368"},"summary":"Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.","upvotes":4,"discussionId":"6a33e7b8fc3a8b1102d943a1","projectPage":"https://rzhub.github.io/GateMem/project.html","githubRepo":"https://github.com/rzhub/GateMem","githubRepoAddedBy":"user","ai_summary":"Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts.","ai_keywords":["memory benchmarks","multi-principal shared-memory agents","access control","active forgetting","long-context prompting","retrieval-based methods","external-memory methods"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":63},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"680c615cb1024379fac47b8c","avatarUrl":"/avatars/1480a62811146de955a782da9e9c8212.svg","isPro":false,"fullname":"Li tonghui","user":"litonghui","type":"user"},{"_id":"69ddb3390f67b65cbfefca87","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/hBYjOj2a_RK2yeWodleSV.jpeg","isPro":false,"fullname":"付本硕","user":"fubenshuo","type":"user"},{"_id":"662df118b0d404635d7a2b46","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662df118b0d404635d7a2b46/JtcwRctghymSkw67zvoPn.jpeg","isPro":false,"fullname":"Zhe Ren","user":"Ray368","type":"user"},{"_id":"6a38cad617951799b4a363c9","avatarUrl":"/avatars/4b9f92fe5c28c3c30143cf73ccbec18c.svg","isPro":false,"fullname":"sun fu","user":"sunyanzhu","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":3,"query":{}}">

Papers

arxiv:2606.18829

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Published on Jun 17

· Submitted by

Zhe Ren on Jun 22

#3 Paper of the day

Upvote

Authors:

Zhe Ren ,

Abstract

Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.

View arXiv page View PDF Project page GitHub 63 Add to collection

Community

Ray368

Paper author Paper submitter about 4 hours ago

•

edited about 3 hours ago

Thanks for checking out GateMem!

GateMem

GateMem is a benchmark for memory governance in multi-principal shared-memory agents. Instead of only asking whether an agent can remember information, GateMem evaluates whether a persistent-memory agent can remain useful, enforce requester-specific access control, and honor deletion requests.

GateMem pl

We release:

91 long-form multi-party episodes
2,218 hidden evaluation checkpoints
4 shared-memory domains: medical, office, education, and household
7 memory-agent baselines across 6 backbone LLMs
official evaluation code
a public leaderboard and online submission interface

Resources:

📄 Paper: https://arxiv.org/abs/2606.18829
🌐 Project page: https://rzhub.github.io/GateMem/project.html
💻 Code: https://github.com/rzhub/GateMem
🤗 Dataset: https://huggingface.co/datasets/Ray368/GateMem
🏆 Leaderboard: https://rzhub.github.io/GateMem/
🚀 Submit results: https://huggingface.co/spaces/Ray368/GateMem-Submit

We welcome new submissions to the leaderboard and feedback from the community.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.18829 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Abstract

Community

Models citing this paper 0

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers