Hugging Face Daily Papers · June 27, 2026 · 4 min read

Fast LeWorldModel

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

E - AI THE THING THEY SAID WOULD NEVER HAPPEN. 100 Billion + agents working on one 16gb ram stick and laptop i 5 from 2018. JesseBrown1980/Asolaria-ASI-On-Metal-Fabric-and-matrix/acer/multicylinder. 100 billion ai agents working locally on my labtops</p>\n","updatedAt":"2026-06-27T17:54:36.210Z","author":{"_id":"650c35e040e6e2660a990c76","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/650c35e040e6e2660a990c76/2RJwXc7vr8ApVBe_xMnrX.jpeg","fullname":"Jesse Daniel Brown","name":"Jessedbrown","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6462163329124451},"editors":["Jessedbrown"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/650c35e040e6e2660a990c76/2RJwXc7vr8ApVBe_xMnrX.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.26217","authors":[{"_id":"6a3ddd943b43e283349ec159","user":{"_id":"67652d0f9275b4a80e143c22","avatarUrl":"/avatars/145058c3179cc82ed97703d4c0ed47d5.svg","isPro":false,"fullname":"Yuntian Gao","user":"naiverer","type":"user","name":"naiverer"},"name":"Yuntian Gao","status":"claimed_verified","statusLastChangedAt":"2026-06-27T15:23:47.835Z","hidden":false},{"_id":"6a3ddd943b43e283349ec15a","name":"Xiangyu Xu","hidden":false}],"publishedAt":"2026-06-24T00:00:00.000Z","submittedOnDailyAt":"2026-06-26T00:00:00.000Z","title":"Fast LeWorldModel","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action sequences by repeatedly applying a local one-step latent transition model. This autoregressive rollout makes planning computationally expensive and exposes the predicted trajectory to accumulated latent errors as the horizon grows. We propose Fast LeWorldModel (Fast-LeWM), a fast latent world model that replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes its prefixes and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, Fast-LeWM directly models action effects accumulated to different extents over multiple horizons. This prefix-level supervision forces the model to learn how states continuously evolve under different action prefixes, rather than only fitting one-step state transitions. During planning, the predictor can use the last prefix token from the encoded action sequence to evaluate the corresponding future latent without explicitly rolling through each intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.","upvotes":22,"discussionId":"6a3ddd953b43e283349ec15b","githubRepo":"https://github.com/Yuntian-Gao/Fast-LeWorldModel","githubRepoAddedBy":"user","ai_summary":"Fast-LeWM accelerates visual planning by replacing autoregressive rollout with parallel action-prefix prediction, reducing computational costs and latency accumulation during long-horizon predictions.","ai_keywords":["Joint-Embedding Predictive Architectures","LeWorldModel","latent world model","visual planning","autoregressive rollout","action-prefix prediction","latent transition model","prefix-level supervision","open-loop latent loss"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":5},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"677272184d148b904333e874","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/5dUau7gxLk4Wm1TiiJJri.jpeg","isPro":false,"fullname":"Efstathios Karypidis","user":"Sta8is","type":"user"},{"_id":"634e60454677a5891c0902f4","avatarUrl":"/avatars/4dc143719afe7686e05b7f2c2c5c1871.svg","isPro":false,"fullname":"Xiangyu Xu","user":"xjcvcvxj","type":"user"},{"_id":"635bca02dc371b8f9101c654","avatarUrl":"/avatars/290ecce3cde2ed47a3f4364cbf7adcd0.svg","isPro":false,"fullname":"Mei Ruofeng","user":"Meiruofeng","type":"user"},{"_id":"69b752804221ec939692f9bf","avatarUrl":"/avatars/15ba064eac4b5abc3d482e4beee5d28c.svg","isPro":false,"fullname":"Wang Haoyu","user":"Hhaoyuu","type":"user"},{"_id":"68f38f0465ecaf07402fad2b","avatarUrl":"/avatars/cb5c8fbfe95cbc6d23e4dccd8d4a2612.svg","isPro":false,"fullname":"Rock","user":"f0rest123","type":"user"},{"_id":"694216ffb942c520aa275ae1","avatarUrl":"/avatars/976af6a3fc75ba8e3a382a2850a94406.svg","isPro":false,"fullname":"Jason","user":"yorfor","type":"user"},{"_id":"6915e3f691c5c55debd82418","avatarUrl":"/avatars/86d7e174de272107fa5efaa6028acf88.svg","isPro":false,"fullname":"kongkaiwei","user":"weiwei76","type":"user"},{"_id":"6703a431c67c24aeada272f6","avatarUrl":"/avatars/5eebd40efdae3783d62f88fe01ee64e8.svg","isPro":false,"fullname":"zsy","user":"whatsZsy","type":"user"},{"_id":"67fcd783d1ec7d15ba66575e","avatarUrl":"/avatars/decbcb963b616c789cca71af302d626a.svg","isPro":false,"fullname":"Austine John","user":"AustineJohnBreaker","type":"user"},{"_id":"67ed0ebc0c63bb84edbf1b58","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/qLL7TfBvr9gUng4iW7sg6.png","isPro":false,"fullname":"Xiewei","user":"Xiewei1211","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.26217.md","query":{}}">

Papers

arxiv:2606.26217

Fast LeWorldModel

Published on Jun 24

· Submitted by

taesiri on Jun 26

Upvote

Authors:

Yuntian Gao ,

Abstract

Fast-LeWM accelerates visual planning by replacing autoregressive rollout with parallel action-prefix prediction, reducing computational costs and latency accumulation during long-horizon predictions.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action sequences by repeatedly applying a local one-step latent transition model. This autoregressive rollout makes planning computationally expensive and exposes the predicted trajectory to accumulated latent errors as the horizon grows. We propose Fast LeWorldModel (Fast-LeWM), a fast latent world model that replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes its prefixes and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, Fast-LeWM directly models action effects accumulated to different extents over multiple horizons. This prefix-level supervision forces the model to learn how states continuously evolve under different action prefixes, rather than only fitting one-step state transitions. During planning, the predictor can use the last prefix token from the encoded action sequence to evaluate the corresponding future latent without explicitly rolling through each intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.