Hugging Face Daily Papers · May 29, 2026 · 6 min read

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

This paper introduces LaRA (Layer-wise Representation Analysis), a framework for detecting data contamination in RL post-trained LLMs by examining how internal representations change across layers rather than relying on output-level signals such as likelihood or entropy.\n","updatedAt":"2026-05-29T01:59:21.160Z","author":{"_id":"6635a672b0a5f86a2aeacd59","avatarUrl":"/avatars/371529d2d5a858d1c26858494ca9722e.svg","fullname":"Minju Gwak","name":"talzoomanzoo","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9136269092559814},"editors":["talzoomanzoo"],"editorAvatarUrls":["/avatars/371529d2d5a858d1c26858494ca9722e.svg"],"reactions":[],"isReport":false}},{"id":"6a1a405c447ed909ef240529","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:41:48.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Not All Tokens Learn Alike: Attention Entropy Reveals Heterogeneous Signals in RL Reasoning](https://huggingface.co/papers/2605.07660) (2026)\n* [G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs](https://huggingface.co/papers/2604.00419) (2026)\n* [Why Does Reinforcement Learning Generalize? A Feature-Level Mechanistic Study of Post-Training in Large Language Models](https://huggingface.co/papers/2604.25011) (2026)\n* [Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models](https://huggingface.co/papers/2605.29303) (2026)\n* [MixSD: Mixed Contextual Self-Distillation for Knowledge Injection](https://huggingface.co/papers/2605.16865) (2026)\n* [Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection](https://huggingface.co/papers/2605.28631) (2026)\n* [When Can LLMs Learn to Reason with Weak Supervision?](https://huggingface.co/papers/2604.18574) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.07660\">Not All Tokens Learn Alike: Attention Entropy Reveals Heterogeneous Signals in RL Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.00419\">G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.25011\">Why Does Reinforcement Learning Generalize? A Feature-Level Mechanistic Study of Post-Training in Large Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.29303\">Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.16865\">MixSD: Mixed Contextual Self-Distillation for Knowledge Injection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.28631\">Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.18574\">When Can LLMs Learn to Reason with Weak Supervision?</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-05-30T01:41:48.846Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7373942136764526},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29888","authors":[{"_id":"6a18f25656b4bb14ec65ce2a","user":{"_id":"6635a672b0a5f86a2aeacd59","avatarUrl":"/avatars/371529d2d5a858d1c26858494ca9722e.svg","isPro":true,"fullname":"Minju Gwak","user":"talzoomanzoo","type":"user","name":"talzoomanzoo"},"name":"Minju Gwak","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:51:18.565Z","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2b","name":"Minseo Kwak","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2c","name":"Dongseok Lee","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2d","name":"Guijin Son","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2e","name":"Alan Ritter","hidden":false},{"_id":"6a18f25656b4bb14ec65ce2f","name":"Jaehyung Kim","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training","submittedOnDailyBy":{"_id":"6635a672b0a5f86a2aeacd59","avatarUrl":"/avatars/371529d2d5a858d1c26858494ca9722e.svg","isPro":true,"fullname":"Minju Gwak","user":"talzoomanzoo","type":"user","name":"talzoomanzoo"},"summary":"Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process itself. Existing detection methods primarily rely on output-level signals such as likelihood or entropy, which become unreliable for RL-trained models since RL shapes behavior through trajectory-level rewards rather than token likelihoods. We propose LaRA, a layer-wise representation analysis framework for detecting contamination in RL post-trained LLMs. LaRA introduces three complementary metrics, measuring perturbation sensitivity, directional collapse, and local representation rigidity under controlled perturbations. We find that contamination produces progressive geometric deviations across layers, including amplified perturbation sensitivity, stronger directional collapse, and enhanced local rigidity. Based on our findings, we also develop a contamination detection protocol that aggregates representation-level deviations across layers and metrics. Experiments on RL-trained reasoning models show that our protocol outperforms existing output-level baselines for contamination detection.","upvotes":19,"discussionId":"6a18f25656b4bb14ec65ce30","ai_summary":"LaRA is a layer-wise representation analysis framework that detects data contamination in reinforcement learning-post-trained large language models by analyzing geometric deviations across model layers.","ai_keywords":["reinforcement learning","large language models","data contamination","layer-wise representation analysis","perturbation sensitivity","directional collapse","local representation rigidity","geometric deviations"],"organization":{"_id":"69bd0d1415b495870e4b786d","name":"yonseiworld","fullname":"Yonsei University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6742e770459000b812f3a276/3DGZ3X6xThktpxnvbyEui.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6635a672b0a5f86a2aeacd59","avatarUrl":"/avatars/371529d2d5a858d1c26858494ca9722e.svg","isPro":true,"fullname":"Minju Gwak","user":"talzoomanzoo","type":"user"},{"_id":"64c8f4cec547ed5243ebd0a8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c8f4cec547ed5243ebd0a8/MiOH5YbMg8Gh9KYlQsLmX.jpeg","isPro":false,"fullname":"Hyungjoo Chae","user":"hyungjoochae","type":"user"},{"_id":"660371123de17851b8d04608","avatarUrl":"/avatars/03daa07bed18859061406278ce6eafa0.svg","isPro":false,"fullname":"Web-Shepherd","user":"Coffee-Gym","type":"user"},{"_id":"60d3e619b8448e1785bbda2a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d3e619b8448e1785bbda2a/q2re5u1HNwsCCyIMtid_I.jpeg","isPro":true,"fullname":"GUIJIN SON","user":"amphora","type":"user"},{"_id":"63e087b6a98d931aa90c1b9c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63e087b6a98d931aa90c1b9c/4ZnfL0U8rrj3cNhj7WTgo.jpeg","isPro":false,"fullname":"Hyunwoo Ko","user":"Cartinoe5930","type":"user"},{"_id":"67e637963f57827c1070c6cc","avatarUrl":"/avatars/4633373c95dbdeb3875892a9a4487c01.svg","isPro":false,"fullname":"Minseo Kwak","user":"meaoww","type":"user"},{"_id":"660fb8375007a7c7dc137323","avatarUrl":"/avatars/b3b2e74d6f1cf1f6e817d8ef4cb44cee.svg","isPro":false,"fullname":"JihoonLee","user":"JihoonLee98","type":"user"},{"_id":"652558818821bac8c9bf19ab","avatarUrl":"/avatars/d3c7d912a61a353ce8617ceffde75236.svg","isPro":false,"fullname":"Yejun Jeon","user":"jaunyeajun","type":"user"},{"_id":"65faccb7978d09f48f33c9c1","avatarUrl":"/avatars/423e7ab4ef953c615409ec1c8655a482.svg","isPro":false,"fullname":"Hamin Koo","user":"hamin2065","type":"user"},{"_id":"65a0b9761754e2f2116fec03","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/5ueoUalmpR-CrCzmrB_DO.png","isPro":false,"fullname":"Yejin Kim","user":"Gina261","type":"user"},{"_id":"67860abe0e6502ccf4a6c9a3","avatarUrl":"/avatars/91959403fb1331154d6e93b7d4d6f833.svg","isPro":false,"fullname":"Seoyeon Kim","user":"yeon-04","type":"user"},{"_id":"6811d91633558457e1c2c7e0","avatarUrl":"/avatars/9e28daed4fd12bce6ced81d3e1d295f3.svg","isPro":false,"fullname":"Yeonjun Hwang","user":"hbhhyj","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69bd0d1415b495870e4b786d","name":"yonseiworld","fullname":"Yonsei University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6742e770459000b812f3a276/3DGZ3X6xThktpxnvbyEui.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29888.md"}">

Papers

arxiv:2605.29888

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

Published on May 28

· Submitted by

Minju Gwak on May 29

Yonsei University

Upvote

Authors:

Minju Gwak ,

Abstract

LaRA is a layer-wise representation analysis framework that detects data contamination in reinforcement learning-post-trained large language models by analyzing geometric deviations across model layers.

AI-generated summary

Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process itself. Existing detection methods primarily rely on output-level signals such as likelihood or entropy, which become unreliable for RL-trained models since RL shapes behavior through trajectory-level rewards rather than token likelihoods. We propose LaRA, a layer-wise representation analysis framework for detecting contamination in RL post-trained LLMs. LaRA introduces three complementary metrics, measuring perturbation sensitivity, directional collapse, and local representation rigidity under controlled perturbations. We find that contamination produces progressive geometric deviations across layers, including amplified perturbation sensitivity, stronger directional collapse, and enhanced local rigidity. Based on our findings, we also develop a contamination detection protocol that aggregates representation-level deviations across layers and metrics. Experiments on RL-trained reasoning models show that our protocol outperforms existing output-level baselines for contamination detection.