Hugging Face Daily Papers · May 13, 2026 · 3 min read

Debiased Model-based Representations for Sample-efficient Continuous Control

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

ICML2026</p>\n","updatedAt":"2026-05-13T08:48:58.342Z","author":{"_id":"6562db314e8918182da42706","avatarUrl":"/avatars/b113bbbb496bf4dac254f0e840f08e10.svg","fullname":"Jiafei Lyu","name":"dmux","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5643742680549622},"editors":["dmux"],"editorAvatarUrls":["/avatars/b113bbbb496bf4dac254f0e840f08e10.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.11711","authors":[{"_id":"6a043a9686b054ce2fa4108e","name":"Jiafei Lyu","hidden":false},{"_id":"6a043a9686b054ce2fa4108f","name":"Zichuan Lin","hidden":false},{"_id":"6a043a9686b054ce2fa41090","name":"Scott Fujimoto","hidden":false},{"_id":"6a043a9686b054ce2fa41091","name":"Kai Yang","hidden":false},{"_id":"6a043a9686b054ce2fa41092","name":"Yangkun Chen","hidden":false},{"_id":"6a043a9686b054ce2fa41093","name":"Saiyong Yang","hidden":false},{"_id":"6a043a9686b054ce2fa41094","name":"Zongqing Lu","hidden":false},{"_id":"6a043a9686b054ce2fa41095","name":"Deheng Ye","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6562db314e8918182da42706/ZnFQkl940AMyNguX6QpWl.png","https://cdn-uploads.huggingface.co/production/uploads/6562db314e8918182da42706/V0wtKcltWRdivlffesxsV.png","https://cdn-uploads.huggingface.co/production/uploads/6562db314e8918182da42706/kjBCC_PUVnj34bc8eGV6q.png"],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"Debiased Model-based Representations for Sample-efficient Continuous Control","submittedOnDailyBy":{"_id":"6562db314e8918182da42706","avatarUrl":"/avatars/b113bbbb496bf4dac254f0e840f08e10.svg","isPro":false,"fullname":"Jiafei Lyu","user":"dmux","type":"user","name":"dmux"},"summary":"Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.","upvotes":7,"discussionId":"6a043a9686b054ce2fa41096","githubRepo":"https://github.com/dmksjfl/DR.Q","githubRepoAddedBy":"user","ai_summary":"DR.Q algorithm improves model-based representations for Q-learning by maximizing mutual information and using faded prioritized experience replay to reduce bias and overfitting in representation learning.","ai_keywords":["model-based representations","latent dynamics information","off-policy actor-critic learning","model-free approaches","model-based approaches","replay buffer","mutual information","faded prioritized experience replay","Q-learning","representation learning"],"githubStars":4,"organization":{"_id":"6645f953c39288df638dbdd5","name":"Tencent-Hunyuan","fullname":"Tencent Hunyuan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62d22496c58f969c152bcefd/woKSjt2wXvBNKussyYPsa.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6562db314e8918182da42706","avatarUrl":"/avatars/b113bbbb496bf4dac254f0e840f08e10.svg","isPro":false,"fullname":"Jiafei Lyu","user":"dmux","type":"user"},{"_id":"692e96fd7bf6e835a968d7c6","avatarUrl":"/avatars/c18a79cb53df74434f2d04ba145fe980.svg","isPro":false,"fullname":"Yinghui Li","user":"liyinghui1998","type":"user"},{"_id":"64930ae54ab21070dc8a42db","avatarUrl":"/avatars/b341feaf8779b39c225c5b044f29e671.svg","isPro":false,"fullname":"Kai Yang","user":"yangkaiSIGS","type":"user"},{"_id":"687da36e2eaea8261f1323d6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/hFH69bJGIDMikEYyClray.png","isPro":false,"fullname":"zichuan lin","user":"zichuan-lin","type":"user"},{"_id":"688db32628a5de6aec891057","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0V4ZttC9RzLLKDIZHPyhD.png","isPro":false,"fullname":"zzz","user":"pulight","type":"user"},{"_id":"640dd700fdeaae139081f598","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/640dd700fdeaae139081f598/L986zu4-iOPFs9Y3_T5Ue.jpeg","isPro":false,"fullname":"Wanpeng Zhang","user":"zawnpn","type":"user"},{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":true,"fullname":"Urro","user":"urroxyz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6645f953c39288df638dbdd5","name":"Tencent-Hunyuan","fullname":"Tencent Hunyuan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62d22496c58f969c152bcefd/woKSjt2wXvBNKussyYPsa.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.11711.md"}">

Papers

arxiv:2605.11711

Debiased Model-based Representations for Sample-efficient Continuous Control

Published on May 12

· Submitted by

Jiafei Lyu on May 13

Tencent Hunyuan

Upvote

Authors:

Abstract

DR.Q algorithm improves model-based representations for Q-learning by maximizing mutual information and using faded prioritized experience replay to reduce bias and overfitting in representation learning.

AI-generated summary

Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.

View arXiv page View PDF GitHub 4 Add to collection

Community

dmux

Paper submitter about 12 hours ago

ICML2026

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.11711

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.11711 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.11711 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.11711 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Debiased Model-based Representations for Sample-efficient Continuous Control

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers