Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (Anonymization with Utility-Retention Adaptation), an LLM-powered mask-reconstruct framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.</p>\n","updatedAt":"2026-06-05T16:47:17.754Z","author":{"_id":"6535f06d9805be89b4393844","avatarUrl":"/avatars/81d8e6e180a776056980f9cc74cf3855.svg","fullname":"Ziwen Li","name":"Aaron43","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8689409494400024},"editors":["Aaron43"],"editorAvatarUrls":["/avatars/81d8e6e180a776056980f9cc74cf3855.svg"],"reactions":[],"isReport":false}},{"id":"6a237bad7ed3eb841bf2e492","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":362,"isUserFollowing":false},"createdAt":"2026-06-06T01:45:17.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Mask-Free Privacy Extraction and Rewriting: A Domain-Aware Approach via Prototype Learning](https://huggingface.co/papers/2604.10145) (2026)\n* [CAMP: Cumulative Agentic Masking and Pruning for Privacy Protection in Multi-Turn LLM Conversations](https://huggingface.co/papers/2604.16521) (2026)\n* [Need to Know: Contextual-Integrity-Grounded Query Rewriting for Privacy-Conscious LLM Delegation](https://huggingface.co/papers/2606.04067) (2026)\n* [A Case Study on the Impact of Anonymization Along the RAG Pipeline](https://huggingface.co/papers/2604.15958) (2026)\n* [MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents](https://huggingface.co/papers/2605.09530) (2026)\n* [Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and Adaptation](https://huggingface.co/papers/2604.06831) (2026)\n* [Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing](https://huggingface.co/papers/2604.23711) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.10145\">Mask-Free Privacy Extraction and Rewriting: A Domain-Aware Approach via Prototype Learning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.16521\">CAMP: Cumulative Agentic Masking and Pruning for Privacy Protection in Multi-Turn LLM Conversations</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.04067\">Need to Know: Contextual-Integrity-Grounded Query Rewriting for Privacy-Conscious LLM Delegation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.15958\">A Case Study on the Impact of Anonymization Along the RAG Pipeline</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.09530\">MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.06831\">Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and Adaptation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.23711\">Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-06-06T01:45:17.271Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":362,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7174628973007202},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.30848","authors":[{"_id":"6a22d9d4e4c258a0294916d4","user":{"_id":"6535f06d9805be89b4393844","avatarUrl":"/avatars/81d8e6e180a776056980f9cc74cf3855.svg","isPro":false,"fullname":"Ziwen Li","user":"Aaron43","type":"user","name":"Aaron43"},"name":"Ziwen Li","status":"admin_assigned","statusLastChangedAt":"2026-06-05T16:41:44.771Z","hidden":false},{"_id":"6a22d9d4e4c258a0294916d5","name":"Jianing Wen","hidden":false},{"_id":"6a22d9d4e4c258a0294916d6","name":"Tianshi Li","hidden":false}],"publishedAt":"2026-06-01T00:00:00.000Z","submittedOnDailyAt":"2026-06-05T00:00:00.000Z","title":"LLM Anonymization Against Agentic Re-Identification","submittedOnDailyBy":{"_id":"6535f06d9805be89b4393844","avatarUrl":"/avatars/81d8e6e180a776056980f9cc74cf3855.svg","isPro":false,"fullname":"Ziwen Li","user":"Aaron43","type":"user","name":"Aaron43"},"summary":"Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (Anonymization with Utility-Retention Adaptation), an LLM-powered mask-reconstruct framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.","upvotes":1,"discussionId":"6a22d9d4e4c258a0294916d7","projectPage":"https://peach-research-lab.github.io/AURA/","githubRepo":"https://github.com/PEACH-Research-Lab/AURA","githubRepoAddedBy":"user","ai_summary":"AURA is an LLM-powered anonymization framework that balances privacy protection against agentic web-search re-identification while preserving contextual utility through adaptive privacy scopes and mask-reconstruct methods.","ai_keywords":["LLM-powered","mask-reconstruct","anonymization","agentic web-search","re-identification","privacy-utility frontier","adaptive privacy scope","contextual utility"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0,"organization":{"_id":"69a3ee3977151a9fca8f8b85","name":"peach-lab","fullname":"Northeastern PEACH Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d99563bcd15bc5cb082a3a/2AG1zBZ77uTdoXHyW6C51.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6535f06d9805be89b4393844","avatarUrl":"/avatars/81d8e6e180a776056980f9cc74cf3855.svg","isPro":false,"fullname":"Ziwen Li","user":"Aaron43","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69a3ee3977151a9fca8f8b85","name":"peach-lab","fullname":"Northeastern PEACH Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d99563bcd15bc5cb082a3a/2AG1zBZ77uTdoXHyW6C51.png"}}">
LLM Anonymization Against Agentic Re-Identification
Abstract
AURA is an LLM-powered anonymization framework that balances privacy protection against agentic web-search re-identification while preserving contextual utility through adaptive privacy scopes and mask-reconstruct methods.
Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (Anonymization with Utility-Retention Adaptation), an LLM-powered mask-reconstruct framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.
Community
Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (Anonymization with Utility-Retention Adaptation), an LLM-powered mask-reconstruct framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.30848 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.30848 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.30848 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.