Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability
Abstract:Many safety-critical control problems are modeled as risk-sensitive partially observable Markov decision processes, where the controller must make decisions from incomplete observations while balancing task performance against safety risk. Although belief-space planning provides a principled solution, maintaining and planning over beliefs can be computationally costly and sensitive to model specification in practical domains. We propose a lightweight risk-gated reinforcement learning approximation for risk-sensitive control under partial observability. The method constructs a compact finite-history proxy state and learns an action-conditioned predictor of near-term safety violation. This predicted candidate-action risk is used in two complementary ways: as a risk penalty during value learning, and as a decision-time gate that interpolates between optimistic and conservative ensemble value estimates. As a result, low-risk actions are evaluated closer to reward-seeking estimates, while high-risk actions are evaluated more conservatively. We evaluate the approach in two safety-critical partially observable domains: automated glucose regulation and safety-constrained navigation. Across adult and adolescent glucose-control cohorts, the method improves overall glycemic tradeoffs and substantially reduces runtime relative to a belief-space planning baseline. On Safety-Gym navigation benchmarks, it achieves a more favorable reward-cost balance than unconstrained RL and several standard safe-RL baselines. These results suggest that action-conditioned near-term risk can provide an effective local signal for approximate risk-sensitive POMDP control when full belief-space planning is impractical.
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY) |
| Cite as: | arXiv:2605.14246 [cs.LG] |
| (or arXiv:2605.14246v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14246
arXiv-issued DOI via DataCite (pending registration)
|
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
Current browse context:
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations
May 15
-
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
May 15
-
Rethinking Molecular OOD Generalization via Target-Aware Source Selection
May 15
-
Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling
May 15
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.