Hugging Face Daily Papers · · 3 min read

MEME: Multi-entity & Evolving Memory Evaluation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Project page: <a href=\"https://seokwonjung-jay.github.io/meme-eval/\" rel=\"nofollow\">https://seokwonjung-jay.github.io/meme-eval/</a><br>Code: <a href=\"https://github.com/SeokwonJung-Jay/MEME-public\" rel=\"nofollow\">https://github.com/SeokwonJung-Jay/MEME-public</a></p>\n","updatedAt":"2026-05-13T04:36:31.037Z","author":{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","fullname":"Arnas Uselis","name":"Gigglingface","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6402975916862488},"editors":["Gigglingface"],"editorAvatarUrls":["/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12477","authors":[{"_id":"6a03fd8286b054ce2fa40f35","name":"Seokwon Jung","hidden":false},{"_id":"6a03fd8286b054ce2fa40f36","name":"Alexander Rubinstein","hidden":false},{"_id":"6a03fd8286b054ce2fa40f37","user":{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","isPro":false,"fullname":"Arnas Uselis","user":"Gigglingface","type":"user","name":"Gigglingface"},"name":"Arnas Uselis","status":"claimed_verified","statusLastChangedAt":"2026-05-13T07:43:35.762Z","hidden":false},{"_id":"6a03fd8286b054ce2fa40f38","name":"Sangdoo Yun","hidden":false},{"_id":"6a03fd8286b054ce2fa40f39","name":"Seong Joon Oh","hidden":false}],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"MEME: Multi-entity & Evolving Memory Evaluation","submittedOnDailyBy":{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","isPro":false,"fullname":"Arnas Uselis","user":"Gigglingface","type":"user","name":"Gigglingface"},"summary":"LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not scored by prior work: Cascade and Absence (dependency reasoning) and Deletion (post-removal state). Evaluating six memory systems spanning three memory paradigms on 100 controlled episodes, we find that all systems collapse on dependency reasoning under the default configuration (Cascade: 3%, Absence: 1% in average accuracy) despite adequate static retrieval performance. Prompt optimization, deeper retrieval, reduced filler noise, and most stronger LLMs fail to close this gap. Only a file-based agent paired with Claude Opus 4.7 as its internal LLM partially closes the gap, but at ~70x the baseline cost, indicating closure currently depends on configurations that are not practical at scale. Code and data are available on the project page: https://seokwonjung-jay.github.io/meme-eval/.","upvotes":6,"discussionId":"6a03fd8386b054ce2fa40f3a","projectPage":"https://seokwonjung-jay.github.io/meme-eval/","githubRepo":"https://github.com/SeokwonJung-Jay/MEME-public","githubRepoAddedBy":"user","ai_summary":"MEME benchmark evaluates memory systems across multiple entities and evolving conditions, revealing persistent challenges in dependency reasoning despite advanced retrieval and prompting techniques.","ai_keywords":["LLM-based agents","persistent environments","memory systems","memory paradigms","dependency reasoning","Cascade","Absence","Deletion"],"githubStars":6,"organization":{"_id":"6475760c33192631bad2bb38","name":"kaist-ai","fullname":"KAIST AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6469949654873f0043b09c22/aaZFiyXe1qR-Dmy_xq67m.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","isPro":false,"fullname":"Arnas Uselis","user":"Gigglingface","type":"user"},{"_id":"648ac415718bc0670a5a5f56","avatarUrl":"/avatars/27189e289b808ef01689ff2abb7a56bf.svg","isPro":false,"fullname":"Sangdoo Yun","user":"oodgnas","type":"user"},{"_id":"631c386bc73939ffc0716a37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1662793811119-noauth.jpeg","isPro":false,"fullname":"SeongWan Kim","user":"idgmatrix","type":"user"},{"_id":"695e10ec94d1963e5dd0891f","avatarUrl":"/avatars/85e72d32190b90e3da4662df7024766c.svg","isPro":false,"fullname":"Seokwon Jung","user":"seowkonjung-Jay","type":"user"},{"_id":"638a50450f10aa3064f03f23","avatarUrl":"/avatars/0c068458e42950c851758a238225c3a6.svg","isPro":false,"fullname":"Seong Joon Oh","user":"coallaoh","type":"user"},{"_id":"64198d7efdfc2970b350f48f","avatarUrl":"/avatars/c0a0f30e1cbc22f1eb6bbc4549a5709c.svg","isPro":false,"fullname":"Alexander Rubinstein","user":"arubique","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6475760c33192631bad2bb38","name":"kaist-ai","fullname":"KAIST AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6469949654873f0043b09c22/aaZFiyXe1qR-Dmy_xq67m.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12477.md"}">
Papers
arxiv:2605.12477

MEME: Multi-entity & Evolving Memory Evaluation

Published on May 12
· Submitted by
Arnas Uselis
on May 13
Authors:
,
,
,

Abstract

MEME benchmark evaluates memory systems across multiple entities and evolving conditions, revealing persistent challenges in dependency reasoning despite advanced retrieval and prompting techniques.

AI-generated summary

LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not scored by prior work: Cascade and Absence (dependency reasoning) and Deletion (post-removal state). Evaluating six memory systems spanning three memory paradigms on 100 controlled episodes, we find that all systems collapse on dependency reasoning under the default configuration (Cascade: 3%, Absence: 1% in average accuracy) despite adequate static retrieval performance. Prompt optimization, deeper retrieval, reduced filler noise, and most stronger LLMs fail to close this gap. Only a file-based agent paired with Claude Opus 4.7 as its internal LLM partially closes the gap, but at ~70x the baseline cost, indicating closure currently depends on configurations that are not practical at scale. Code and data are available on the project page: https://seokwonjung-jay.github.io/meme-eval/.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.12477
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.12477 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.12477 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.12477 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers