DVD-JEPA: an open-source, fully-reproducible JEPA world model [P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| A paper currently trending on paperswithcode.co in the "Anomaly Detection" category is DVD-JEPA. https://i.redd.it/r6fd8n3d4f8h1.gif Here is the short summary: Most attempts to learn a world model from video try to predict the next frame pixel-by-pixel, and drown in detail that is fundamentally unpredictable. JEPA (Joint-Embedding Predictive Architecture, LeCun 2022) makes a different bet: predict the representation of the future, not the pixels, and let the encoder discard whatever it cannot predict. DVD-JEPA is the smallest honest demonstration of that idea we could build. The "world" is a DVD logo bouncing in a 16×16 box. A context encoder, an EMA target encoder, and a latent predictor are trained — with no labels and no decoder — to predict the next observation in a 32-dimensional representation space. We then show three things:
The whole thing runs client-side in your browser — the trained MLPs are re-implemented in ~40 lines of JavaScript. It is a joke, and it is also a correct, working instance of the architecture behind I-JEPA, V-JEPA, and V-JEPA 2. Find the paper, HF model, and project page here: https://paperswithcode.co/paper/98361 [link] [comments] |
More from r/MachineLearning
-
Loss functions in Instance Representation Learning [R]
Jun 29
-
Price elasticity model [R]
Jun 29
-
Rejected MICCAI paper: workshop -> journal/conference or directly journal/conference [R]
Jun 29
-
I built a demo agricultural planning system with an AI advisor for small-scale farmers in Nicaragua using NASA data [p]
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.