This paper introduces LaRA (Layer-wise Representation Analysis), a framework for detecting data contamination in RL post-trained LLMs by examining how internal representations change across layers rather than relying on output-level signals such as likelihood or entropy.</p>\n","updatedAt":"2026-05-29T01:59:21.160Z","author":{"_id":"6635a672b0a5f86a2aeacd59","avatarUrl":"/avatars/371529d2d5a858d1c26858494ca9722e.svg","fullname":"Minju Gwak","name":"talzoomanzoo","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9136269092559814},"editors":["talzoomanzoo"],"editorAvatarUrls":["/avatars/371529d2d5a858d1c26858494ca9722e.svg"],"reactions":[],"isReport":false}},{"id":"6a1a405c447ed909ef240529","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:41:48.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Not All Tokens Learn Alike: Attention Entropy Reveals Heterogeneous Signals in RL Reasoning](https://huggingface.co/papers/2605.07660) (2026)\n* [G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs](https://huggingface.co/papers/2604.00419) (2026)\n* [Why Does Reinforcement Learning Generalize? A Feature-Level Mechanistic Study of Post-Training in Large Language Models](https://huggingface.co/papers/2604.25011) (2026)\n* [Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models](https://huggingface.co/papers/2605.29303) (2026)\n* [MixSD: Mixed Contextual Self-Distillation for Knowledge Injection](https://huggingface.co/papers/2605.16865) (2026)\n* [Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection](https://huggingface.co/papers/2605.28631) (2026)\n* [When Can LLMs Learn to Reason with Weak Supervision?](https://huggingface.co/papers/2604.18574) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.07660\">Not All Tokens Learn Alike: Attention Entropy Reveals Heterogeneous Signals in RL Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.00419\">G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.25011\">Why Does Reinforcement Learning Generalize? A Feature-Level Mechanistic Study of Post-Training in Large Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.29303\">Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.16865\">MixSD: Mixed Contextual Self-Distillation for Knowledge Injection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.28631\">Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.18574\">When Can LLMs Learn to Reason with Weak Supervision?</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:41:48.846Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7373942136764526},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29888","authors":[{"_id":"6a18f25656b4bb14ec65ce2a","user":{"_id":"6635a672b0a5f86a2aeacd59","avatarUrl":"/avatars/371529d2d5a858d1c26858494ca9722e.svg","isPro":true,"fullname":"Minju Gwak","user":"talzoomanzoo","type":"user","name":"talzoomanzoo"},"name":"Minju Gwak","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:51:18.565Z","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2b","name":"Minseo Kwak","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2c","name":"Dongseok Lee","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2d","name":"Guijin Son","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2e","name":"Alan Ritter","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2f","name":"Jaehyung Kim","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training","submittedOnDailyBy":{"_id":"6635a672b0a5f86a2aeacd59","avatarUrl":"/avatars/371529d2d5a858d1c26858494ca9722e.svg","isPro":true,"fullname":"Minju Gwak","user":"talzoomanzoo","type":"user","name":"talzoomanzoo"},"summary":"Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process itself. Existing detection methods primarily rely on output-level signals such as likelihood or entropy, which become unreliable for RL-trained models since RL shapes behavior through trajectory-level rewards rather than token likelihoods. We propose LaRA, a layer-wise representation analysis framework for detecting contamination in RL post-trained LLMs. LaRA introduces three complementary metrics, measuring perturbation sensitivity, directional collapse, and local representation rigidity under controlled perturbations. We find that contamination produces progressive geometric deviations across layers, including amplified perturbation sensitivity, stronger directional collapse, and enhanced local rigidity. Based on our findings, we also develop a contamination detection protocol that aggregates representation-level deviations across layers and metrics. Experiments on RL-trained reasoning models show that our protocol outperforms existing output-level baselines for contamination detection.","upvotes":19,"discussionId":"6a18f25656b4bb14ec65ce30","ai_summary":"LaRA is a layer-wise representation analysis framework that detects data contamination in reinforcement learning-post-trained large language models by analyzing geometric deviations across model layers.","ai_keywords":["reinforcement learning","large language models","data contamination","layer-wise representation analysis","perturbation sensitivity","directional collapse","local representation rigidity","geometric deviations"],"organization":{"_id":"69bd0d1415b495870e4b786d","name":"yonseiworld","fullname":"Yonsei University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6742e770459000b812f3a276/3DGZ3X6xThktpxnvbyEui.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6635a672b0a5f86a2aeacd59","avatarUrl":"/avatars/371529d2d5a858d1c26858494ca9722e.svg","isPro":true,"fullname":"Minju Gwak","user":"talzoomanzoo","type":"user"},{"_id":"64c8f4cec547ed5243ebd0a8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c8f4cec547ed5243ebd0a8/MiOH5YbMg8Gh9KYlQsLmX.jpeg","isPro":false,"fullname":"Hyungjoo Chae","user":"hyungjoochae","type":"user"},{"_id":"660371123de17851b8d04608","avatarUrl":"/avatars/03daa07bed18859061406278ce6eafa0.svg","isPro":false,"fullname":"Web-Shepherd","user":"Coffee-Gym","type":"user"},{"_id":"60d3e619b8448e1785bbda2a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d3e619b8448e1785bbda2a/q2re5u1HNwsCCyIMtid_I.jpeg","isPro":true,"fullname":"GUIJIN SON","user":"amphora","type":"user"},{"_id":"63e087b6a98d931aa90c1b9c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63e087b6a98d931aa90c1b9c/4ZnfL0U8rrj3cNhj7WTgo.jpeg","isPro":false,"fullname":"Hyunwoo Ko","user":"Cartinoe5930","type":"user"},{"_id":"67e637963f57827c1070c6cc","avatarUrl":"/avatars/4633373c95dbdeb3875892a9a4487c01.svg","isPro":false,"fullname":"Minseo Kwak","user":"meaoww","type":"user"},{"_id":"660fb8375007a7c7dc137323","avatarUrl":"/avatars/b3b2e74d6f1cf1f6e817d8ef4cb44cee.svg","isPro":false,"fullname":"JihoonLee","user":"JihoonLee98","type":"user"},{"_id":"652558818821bac8c9bf19ab","avatarUrl":"/avatars/d3c7d912a61a353ce8617ceffde75236.svg","isPro":false,"fullname":"Yejun Jeon","user":"jaunyeajun","type":"user"},{"_id":"65faccb7978d09f48f33c9c1","avatarUrl":"/avatars/423e7ab4ef953c615409ec1c8655a482.svg","isPro":false,"fullname":"Hamin Koo","user":"hamin2065","type":"user"},{"_id":"65a0b9761754e2f2116fec03","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/5ueoUalmpR-CrCzmrB_DO.png","isPro":false,"fullname":"Yejin Kim","user":"Gina261","type":"user"},{"_id":"67860abe0e6502ccf4a6c9a3","avatarUrl":"/avatars/91959403fb1331154d6e93b7d4d6f833.svg","isPro":false,"fullname":"Seoyeon Kim","user":"yeon-04","type":"user"},{"_id":"6811d91633558457e1c2c7e0","avatarUrl":"/avatars/9e28daed4fd12bce6ced81d3e1d295f3.svg","isPro":false,"fullname":"Yeonjun Hwang","user":"hbhhyj","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69bd0d1415b495870e4b786d","name":"yonseiworld","fullname":"Yonsei University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6742e770459000b812f3a276/3DGZ3X6xThktpxnvbyEui.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29888.md"}">
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
Abstract
LaRA is a layer-wise representation analysis framework that detects data contamination in reinforcement learning-post-trained large language models by analyzing geometric deviations across model layers.
AI-generated summary
Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process itself. Existing detection methods primarily rely on output-level signals such as likelihood or entropy, which become unreliable for RL-trained models since RL shapes behavior through trajectory-level rewards rather than token likelihoods. We propose LaRA, a layer-wise representation analysis framework for detecting contamination in RL post-trained LLMs. LaRA introduces three complementary metrics, measuring perturbation sensitivity, directional collapse, and local representation rigidity under controlled perturbations. We find that contamination produces progressive geometric deviations across layers, including amplified perturbation sensitivity, stronger directional collapse, and enhanced local rigidity. Based on our findings, we also develop a contamination detection protocol that aggregates representation-level deviations across layers and metrics. Experiments on RL-trained reasoning models show that our protocol outperforms existing output-level baselines for contamination detection.
Community
This paper introduces LaRA (Layer-wise Representation Analysis), a framework for detecting data contamination in RL post-trained LLMs by examining how internal representations change across layers rather than relying on output-level signals such as likelihood or entropy.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.29888 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.29888 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.29888 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.