Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification
Abstract:We study KL-regularized contextual bandits and episodic reinforcement learning (RL) under general function approximation with model misspecification. Existing guarantees rely on realizability and therefore do not extend to misspecified models, where classical regret bounds may fail. This work introduces KL misspecification formulations for contextual bandits and episodic RL and analyzes regression-based algorithms with Gibbs policy updates. High-probability KL-regret guarantees with explicit misspecification terms are established, recovering the standard realizable KL-regularized setting as a special case.
| Comments: | Accepted by RLC 2026 |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2606.06053 [cs.LG] |
| (or arXiv:2606.06053v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2606.06053
arXiv-issued DOI via DataCite (pending registration)
|
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
The Evaluation Blind Spot: A Stereological Theory of Benchmark Coverage for Large Language Models
Jun 5
-
ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models
Jun 5
-
Staged Factorial Screening for Budget-Constrained Micro-Pretraining
Jun 5
-
PyCC.id: A package for hypothesis-driven equation discovery with structural identifiability
Jun 5
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.