Hugging Face Daily Papers · June 2, 2026 · 4 min read

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We show that AI text watermarks can be surprisingly fragile in multi-model settings. By averaging output probabilities across several models, our method WASH cancels out watermark perturbations, substantially reducing detection scores while maintaining generation quality and faster generation.</p>\n","updatedAt":"2026-06-02T14:15:02.967Z","author":{"_id":"620cb0582d8bc91e1cf2e6e6","avatarUrl":"/avatars/358ce8047869b1785ea2025eed7f72f2.svg","fullname":"Zhihao Wu","name":"KunH","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8843420743942261},"editors":["KunH"],"editorAvatarUrls":["/avatars/358ce8047869b1785ea2025eed7f72f2.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.30501","authors":[{"_id":"6a1d65e8808ddbc3c7d437cf","user":{"_id":"620cb0582d8bc91e1cf2e6e6","avatarUrl":"/avatars/358ce8047869b1785ea2025eed7f72f2.svg","isPro":false,"fullname":"Zhihao Wu","user":"KunH","type":"user","name":"KunH"},"name":"Zhihao Wu","status":"claimed_verified","statusLastChangedAt":"2026-06-02T12:10:40.208Z","hidden":false},{"_id":"6a1d65e8808ddbc3c7d437d0","name":"Gracia Gong","hidden":false},{"_id":"6a1d65e8808ddbc3c7d437d1","name":"Qinglin Zhu","hidden":false},{"_id":"6a1d65e8808ddbc3c7d437d2","user":{"_id":"6a1d6971b4238bb17ff48593","avatarUrl":"/avatars/0d88ade3e031a3ac0eca39f3fc0a88b6.svg","isPro":false,"fullname":"Yudong Chen","user":"yudongchen88","type":"user","name":"yudongchen88"},"name":"Yudong Chen","status":"claimed_verified","statusLastChangedAt":"2026-06-02T12:10:31.342Z","hidden":false},{"_id":"6a1d65e8808ddbc3c7d437d3","user":{"_id":"6a1ebbcfef2ed9793fcdb174","avatarUrl":"/avatars/ad9ad43307f3ed5b3869e80b871939c1.svg","isPro":false,"fullname":"Runcong Zhao","user":"Runcong","type":"user","name":"Runcong"},"name":"Runcong Zhao","status":"claimed_verified","statusLastChangedAt":"2026-06-02T12:10:28.945Z","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs","submittedOnDailyBy":{"_id":"620cb0582d8bc91e1cf2e6e6","avatarUrl":"/avatars/358ce8047869b1785ea2025eed7f72f2.svg","isPro":false,"fullname":"Zhihao Wu","user":"KunH","type":"user","name":"KunH"},"summary":"Watermarking embeds statistical signatures in AI-generated text for detection and attribution. We reveal a fundamental vulnerability: when users access multiple models (today's reality), watermarks trivially fail. Watermarks perturb output distributions away from the original, and in competitive markets, these perturbations are typically independent across providers. We theoretically prove that averaging output probability distributions recovers the unwatermarked distribution with up to a second-order error term. Empirically, simply averaging 3-5 models cancels out these perturbations. We introduce WASH (Watermark Attenuation via Statistical Hybridisation), which solves practical challenges in ensemble generation: vocabulary misalignment and tokenisation differences across heterogeneous models. Experiments across six watermarking schemes and three LLMs show that averaging across 3 models suppresses detection z-scores from 5-300 to below 2 (below the detection threshold of 4) and reduces TPR at 5% FPR to below 50%, while improving quality by 27.5% and running 6 times faster than the best baseline on the long sequence generation. Our results suggest that robust AI-text detection via watermarking requires either accepting this fundamental vulnerability or unprecedented coordination among model providers.","upvotes":25,"discussionId":"6a1d65e9808ddbc3c7d437d4","ai_summary":"Watermarking AI-generated text for detection fails when multiple models are used, as averaging outputs cancels perturbations and suppresses detection while improving quality and speed.","ai_keywords":["watermarking","output distributions","perturbations","ensemble generation","vocabulary misalignment","tokenisation","watermark attenuation","statistical hybridisation","detection z-scores","TPR","FPR"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"602d1122374a0dbe5856eca3","name":"KingsCollegeLondon","fullname":"King's College London","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1613566183325-5e54fab537cb5b49818287e5.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620cb0582d8bc91e1cf2e6e6","avatarUrl":"/avatars/358ce8047869b1785ea2025eed7f72f2.svg","isPro":false,"fullname":"Zhihao Wu","user":"KunH","type":"user"},{"_id":"66596d64ce1b2838888f4401","avatarUrl":"/avatars/c0545ed43c775ed8791d9d3b2fbc1861.svg","isPro":false,"fullname":"Linhai Zhang","user":"lzhang472","type":"user"},{"_id":"641b31a4ec5b871c0bcd6932","avatarUrl":"/avatars/bad42279ad99918b0846053f2fa95ac8.svg","isPro":false,"fullname":"Zhenyi Shen","user":"zen-E","type":"user"},{"_id":"6356b7c580f8cb3ab777d1e1","avatarUrl":"/avatars/e815a2e2d496c698c41ce6f75c22840c.svg","isPro":false,"fullname":"Jiangnan Ye","user":"Jiang-nan","type":"user"},{"_id":"6968ddaaac7287bd4216d867","avatarUrl":"/avatars/4f06d2be21dbd01d10272ee805ef63ca.svg","isPro":false,"fullname":"Gracia Gong","user":"GRC-23","type":"user"},{"_id":"674a061cca6ebc1a0e781c2a","avatarUrl":"/avatars/0c57b3d9856b25cb6901fc746eb4cd85.svg","isPro":false,"fullname":"mingrui ye","user":"BigRayss","type":"user"},{"_id":"65966909eed0e88dfd52f804","avatarUrl":"/avatars/531f90fe20e702c5e6a82f470a776d77.svg","isPro":false,"fullname":"shun shao","user":"shunshao","type":"user"},{"_id":"67a1793783c3565727c83e3c","avatarUrl":"/avatars/02bc45fca8b63cbb1e264756ecc407bc.svg","isPro":false,"fullname":"R Zhao","user":"blpxspg","type":"user"},{"_id":"64be88e9af05eb17c702787c","avatarUrl":"/avatars/8953032acd739a0780e33cc46b0f9b56.svg","isPro":false,"fullname":"J Li","user":"jiazhengli","type":"user"},{"_id":"64fed23f0871bc5930598ab5","avatarUrl":"/avatars/080a4ef3e4634cd978528dfa899a4eb0.svg","isPro":false,"fullname":"ZhiWei LI","user":"Aragonaa","type":"user"},{"_id":"66e2932e5c100c12aa2def39","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/FiQ5Fap-qVqnXeULGPYs6.png","isPro":false,"fullname":"weiliu","user":"thinkwee","type":"user"},{"_id":"65040324dcfe8fd06a7a7989","avatarUrl":"/avatars/b8d33101ffb4732943b2e989e371291b.svg","isPro":false,"fullname":"Yao","user":"Lucasoppem","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"602d1122374a0dbe5856eca3","name":"KingsCollegeLondon","fullname":"King's College London","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1613566183325-5e54fab537cb5b49818287e5.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.30501.md"}">

Papers

arxiv:2605.30501

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Published on May 28

· Submitted by

Zhihao Wu on Jun 2

King's College London

Upvote

Authors:

Zhihao Wu ,

Yudong Chen ,

Runcong Zhao

Abstract

Watermarking AI-generated text for detection fails when multiple models are used, as averaging outputs cancels perturbations and suppresses detection while improving quality and speed.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Watermarking embeds statistical signatures in AI-generated text for detection and attribution. We reveal a fundamental vulnerability: when users access multiple models (today's reality), watermarks trivially fail. Watermarks perturb output distributions away from the original, and in competitive markets, these perturbations are typically independent across providers. We theoretically prove that averaging output probability distributions recovers the unwatermarked distribution with up to a second-order error term. Empirically, simply averaging 3-5 models cancels out these perturbations. We introduce WASH (Watermark Attenuation via Statistical Hybridisation), which solves practical challenges in ensemble generation: vocabulary misalignment and tokenisation differences across heterogeneous models. Experiments across six watermarking schemes and three LLMs show that averaging across 3 models suppresses detection z-scores from 5-300 to below 2 (below the detection threshold of 4) and reduces TPR at 5% FPR to below 50%, while improving quality by 27.5% and running 6 times faster than the best baseline on the long sequence generation. Our results suggest that robust AI-text detection via watermarking requires either accepting this fundamental vulnerability or unprecedented coordination among model providers.