Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning
Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning
Abstract
Discrete-WAM introduces a unified discrete latent vision-action world policy that enables compositional causal reasoning and counterfactual reasoning in autonomous driving through aligned discrete tokens and a shared discrete diffusion framework.
Autonomous driving requires reasoning about how ego actions shape the evolution of the surrounding world. However, most end-to-end methods rely on direct state-to-action mappings, capturing correlations without explicitly modeling action-conditioned dynamics. Conversely, continuous-latent world models often lack compositional structure for causal reasoning across counterfactual futures. We introduce Discrete-WAM, a unified latent vision-action world policy that represents future visual states and ego actions as aligned discrete tokens, enabling compositional causal reasoning across alternative futures. Built upon this unified discrete alignment, Discrete-WAM establishes a shared discrete diffusion framework with unified generative tasks, jointly formulating world modeling, world-action policy, and hierarchical decision-enabled policy, supporting compositional generalization across diverse driving scenarios. Experiments on large-scale autonomous-driving benchmarks show that Discrete-WAM achieves competitive performance while supporting controllable generation and counterfactual reasoning, offering a principled path toward more reliable decision-making.
Get this paper in your agent:
hf papers read 2606.05645 curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
More from Hugging Face Daily Papers
-
Flash-WAM: Modality-Aware Distillation for World Action Models
Jun 5
-
World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis
Jun 5
-
SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction
Jun 5
-
Towards One-to-Many Temporal Grounding
Jun 5
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.