Hugging Face Daily Papers · June 5, 2026 · 5 min read

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We introduced PropMe and SimpleTrace to prove that while AI can be forced to leak training data under attack, its natural propensity to do so during everyday use is remarkably low. We also found that continuous training helps models naturally dilute and forget these old memories over time, confirming previous work. Ultimately, we argue that AI safety audits must evolve to measure real-world leakage propensity, not just worst-case hacks, to give us a true, comprehensive picture of this phenomenon.\n","updatedAt":"2026-06-05T07:28:53.286Z","author":{"_id":"6652354cb88e4539b2189cd7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6652354cb88e4539b2189cd7/kZ7Mi6Yz7zbOSLqgFW5jt.jpeg","fullname":"Gianluca Barmina","name":"giannor","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9494604468345642},"editors":["giannor"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6652354cb88e4539b2189cd7/kZ7Mi6Yz7zbOSLqgFW5jt.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.06286","authors":[{"_id":"6a2278b23490a593e87b1619","name":"Gianluca Barmina","hidden":false},{"_id":"6a2278b23490a593e87b161a","name":"Peter Schneider-Kamp","hidden":false},{"_id":"6a2278b23490a593e87b161b","name":"Lukas Galke Poech","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6652354cb88e4539b2189cd7/FQKsm1jIAVBT5MZ3gnp6H.png"],"publishedAt":"2026-06-04T00:00:00.000Z","submittedOnDailyAt":"2026-06-05T00:00:00.000Z","title":"LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs","submittedOnDailyBy":{"_id":"6652354cb88e4539b2189cd7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6652354cb88e4539b2189cd7/kZ7Mi6Yz7zbOSLqgFW5jt.jpeg","isPro":false,"fullname":"Gianluca Barmina","user":"giannor","type":"user","name":"giannor"},"summary":"Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based capability attacks with non-adversarial evaluations. We propose a metric transformation that, applied to existing functions, allows to create propensity metrics. We further introduce SimpleTrace, a lightweight tracing pipeline built on infini-gram that deterministically attributes model generations to large-scale training corpora and computes verbatim, near-verbatim, and propensity-transformed memorization metrics. Evaluating two fully-open models: Comma and DFM Decoder on two datasets: Common Pile and Dynaword in two languages, we find a consistent gap between capability and propensity: prefix attacks elicit substantially stronger memorization signals than generic or dataset-specific prompts, while propensity scores remain low overall. Thus, the models can reveal training data when directly elicited, but rarely do so in more common non-adversarial settings. We also find that DFM Decoder, which is continually pre-trained from Comma, exhibits reduced memorization and memorization propensity for Common Pile, confirming that memorization capability can decrease when later training emphasizes partially different data. Our results suggest, and we encourage, that memorization audits should report both worst-case extractability and ordinary leakage propensity in order to have a more comprehensive view of this phenomenon.","upvotes":6,"discussionId":"6a2278b33490a593e87b161c","githubRepo":"https://github.com/N-essuno/PropMe","githubRepoAddedBy":"user","ai_summary":"PropMe framework evaluates language model memorization by distinguishing between forced reproduction capabilities and natural propensity, using SimpleTrace for deterministic attribution and propensity-transformed metrics across open models and datasets.","ai_keywords":["memorization evaluation","propensity-aware framework","prefix-based capability attacks","metric transformation","infin-gram","deterministic attribution","verbatim memorization","near-verbatim memorization","propensity-transformed memorization","capability attacks","ordinary leakage propensity","worst-case extractability"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0,"organization":{"_id":"68adbfb1dd070a92488069b1","name":"SDU-Denmark","fullname":"University of Southern Denmark (SDU)","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68adbdb50fdaa186aa43d1ce/f1kfMH47RIckIAOWEi1mv.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6652354cb88e4539b2189cd7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6652354cb88e4539b2189cd7/kZ7Mi6Yz7zbOSLqgFW5jt.jpeg","isPro":false,"fullname":"Gianluca Barmina","user":"giannor","type":"user"},{"_id":"62cd65da816d30201adca921","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cd65da816d30201adca921/M5b4wXhYokQjLzEoGcBmI.jpeg","isPro":false,"fullname":"Lukas Galke Poech","user":"lgalke","type":"user"},{"_id":"624d671d953e603497e0eb28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/624d671d953e603497e0eb28/8-xsTsJAV0xBfQgqLwIC0.png","isPro":false,"fullname":"Federico Torrielli","user":"EvilScript","type":"user"},{"_id":"68b031d6aa3a9d6ef8ff91ca","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/-uFUU2OfVN02ttCtgIVOw.png","isPro":false,"fullname":"Annemette Brok Pirchert","user":"popunicorn","type":"user"},{"_id":"69c1179e9a2516a261e54b5c","avatarUrl":"/avatars/819f583a75aca9c54d2daecb3a04fe3f.svg","isPro":false,"fullname":"Mogens From","user":"mfrom","type":"user"},{"_id":"65dee4eb2df2dd7ceecb5850","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65dee4eb2df2dd7ceecb5850/WZCx-1X-7944O-BX7h29L.jpeg","isPro":false,"fullname":"Jacob Nielsen","user":"JacobBITLABS","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68adbfb1dd070a92488069b1","name":"SDU-Denmark","fullname":"University of Southern Denmark (SDU)","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68adbdb50fdaa186aa43d1ce/f1kfMH47RIckIAOWEi1mv.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.06286.md"}">

Papers

arxiv:2606.06286

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

Published on Jun 4

· Submitted by

Gianluca Barmina on Jun 5

University of Southern Denmark (SDU)

Upvote

Authors:

Abstract

PropMe framework evaluates language model memorization by distinguishing between forced reproduction capabilities and natural propensity, using SimpleTrace for deterministic attribution and propensity-transformed metrics across open models and datasets.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based capability attacks with non-adversarial evaluations. We propose a metric transformation that, applied to existing functions, allows to create propensity metrics. We further introduce SimpleTrace, a lightweight tracing pipeline built on infini-gram that deterministically attributes model generations to large-scale training corpora and computes verbatim, near-verbatim, and propensity-transformed memorization metrics. Evaluating two fully-open models: Comma and DFM Decoder on two datasets: Common Pile and Dynaword in two languages, we find a consistent gap between capability and propensity: prefix attacks elicit substantially stronger memorization signals than generic or dataset-specific prompts, while propensity scores remain low overall. Thus, the models can reveal training data when directly elicited, but rarely do so in more common non-adversarial settings. We also find that DFM Decoder, which is continually pre-trained from Comma, exhibits reduced memorization and memorization propensity for Common Pile, confirming that memorization capability can decrease when later training emphasizes partially different data. Our results suggest, and we encourage, that memorization audits should report both worst-case extractability and ordinary leakage propensity in order to have a more comprehensive view of this phenomenon.

View arXiv page View PDF GitHub 0 Add to collection

Community

giannor

Paper submitter about 4 hours ago

We introduced PropMe and SimpleTrace to prove that while AI can be forced to leak training data under attack, its natural propensity to do so during everyday use is remarkably low. We also found that continuous training helps models naturally dilute and forget these old memories over time, confirming previous work. Ultimately, we argue that AI safety audits must evolve to measure real-world leakage propensity, not just worst-case hacks, to give us a true, comprehensive picture of this phenomenon.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.06286

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.06286 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.06286 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.06286 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers