ICML2026</p>\n","updatedAt":"2026-05-13T08:48:58.342Z","author":{"_id":"6562db314e8918182da42706","avatarUrl":"/avatars/b113bbbb496bf4dac254f0e840f08e10.svg","fullname":"Jiafei Lyu","name":"dmux","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5643742680549622},"editors":["dmux"],"editorAvatarUrls":["/avatars/b113bbbb496bf4dac254f0e840f08e10.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.11711","authors":[{"_id":"6a043a9686b054ce2fa4108e","name":"Jiafei Lyu","hidden":false},{"_id":"6a043a9686b054ce2fa4108f","name":"Zichuan Lin","hidden":false},{"_id":"6a043a9686b054ce2fa41090","name":"Scott Fujimoto","hidden":false},{"_id":"6a043a9686b054ce2fa41091","name":"Kai Yang","hidden":false},{"_id":"6a043a9686b054ce2fa41092","name":"Yangkun Chen","hidden":false},{"_id":"6a043a9686b054ce2fa41093","name":"Saiyong Yang","hidden":false},{"_id":"6a043a9686b054ce2fa41094","name":"Zongqing Lu","hidden":false},{"_id":"6a043a9686b054ce2fa41095","name":"Deheng Ye","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6562db314e8918182da42706/ZnFQkl940AMyNguX6QpWl.png","https://cdn-uploads.huggingface.co/production/uploads/6562db314e8918182da42706/V0wtKcltWRdivlffesxsV.png","https://cdn-uploads.huggingface.co/production/uploads/6562db314e8918182da42706/kjBCC_PUVnj34bc8eGV6q.png"],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"Debiased Model-based Representations for Sample-efficient Continuous Control","submittedOnDailyBy":{"_id":"6562db314e8918182da42706","avatarUrl":"/avatars/b113bbbb496bf4dac254f0e840f08e10.svg","isPro":false,"fullname":"Jiafei Lyu","user":"dmux","type":"user","name":"dmux"},"summary":"Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.","upvotes":7,"discussionId":"6a043a9686b054ce2fa41096","githubRepo":"https://github.com/dmksjfl/DR.Q","githubRepoAddedBy":"user","ai_summary":"DR.Q algorithm improves model-based representations for Q-learning by maximizing mutual information and using faded prioritized experience replay to reduce bias and overfitting in representation learning.","ai_keywords":["model-based representations","latent dynamics information","off-policy actor-critic learning","model-free approaches","model-based approaches","replay buffer","mutual information","faded prioritized experience replay","Q-learning","representation learning"],"githubStars":4,"organization":{"_id":"6645f953c39288df638dbdd5","name":"Tencent-Hunyuan","fullname":"Tencent Hunyuan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62d22496c58f969c152bcefd/woKSjt2wXvBNKussyYPsa.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6562db314e8918182da42706","avatarUrl":"/avatars/b113bbbb496bf4dac254f0e840f08e10.svg","isPro":false,"fullname":"Jiafei Lyu","user":"dmux","type":"user"},{"_id":"692e96fd7bf6e835a968d7c6","avatarUrl":"/avatars/c18a79cb53df74434f2d04ba145fe980.svg","isPro":false,"fullname":"Yinghui Li","user":"liyinghui1998","type":"user"},{"_id":"64930ae54ab21070dc8a42db","avatarUrl":"/avatars/b341feaf8779b39c225c5b044f29e671.svg","isPro":false,"fullname":"Kai Yang","user":"yangkaiSIGS","type":"user"},{"_id":"687da36e2eaea8261f1323d6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/hFH69bJGIDMikEYyClray.png","isPro":false,"fullname":"zichuan lin","user":"zichuan-lin","type":"user"},{"_id":"688db32628a5de6aec891057","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0V4ZttC9RzLLKDIZHPyhD.png","isPro":false,"fullname":"zzz","user":"pulight","type":"user"},{"_id":"640dd700fdeaae139081f598","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/640dd700fdeaae139081f598/L986zu4-iOPFs9Y3_T5Ue.jpeg","isPro":false,"fullname":"Wanpeng Zhang","user":"zawnpn","type":"user"},{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":true,"fullname":"Urro","user":"urroxyz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6645f953c39288df638dbdd5","name":"Tencent-Hunyuan","fullname":"Tencent Hunyuan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62d22496c58f969c152bcefd/woKSjt2wXvBNKussyYPsa.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.11711.md"}">
Debiased Model-based Representations for Sample-efficient Continuous Control
Abstract
DR.Q algorithm improves model-based representations for Q-learning by maximizing mutual information and using faded prioritized experience replay to reduce bias and overfitting in representation learning.
AI-generated summary
Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.11711 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.11711 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.11711 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.