Hugging Face Daily Papers · June 9, 2026 · 3 min read

EMMA: Extracting Multiple physical parameters from Multimodal Data

#model-release #multimodal #paper #music

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

To the best of our knowledge this is the first work towards multi-modal physics parameter estimation. Published in CVPR 2026</p>\n","updatedAt":"2026-06-09T05:49:37.825Z","author":{"_id":"66cf9f4aefb50dc9e77e6e8a","avatarUrl":"/avatars/e67a1c65ed52524d3ac76c7b862f37f5.svg","fullname":"Ayan Banerjee","name":"abanerj3","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.6934453248977661},"editors":["abanerj3"],"editorAvatarUrls":["/avatars/e67a1c65ed52524d3ac76c7b862f37f5.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.24047","authors":[{"_id":"6a244fb7e4c258a029491abc","name":"Farhat Shaikh","hidden":false},{"_id":"6a244fb7e4c258a029491abd","user":{"_id":"66cf9f4aefb50dc9e77e6e8a","avatarUrl":"/avatars/e67a1c65ed52524d3ac76c7b862f37f5.svg","isPro":false,"fullname":"Ayan Banerjee","user":"abanerj3","type":"user","name":"abanerj3"},"name":"Ayan Banerjee","status":"claimed_verified","statusLastChangedAt":"2026-06-08T09:45:25.184Z","hidden":false},{"_id":"6a244fb7e4c258a029491abe","name":"Sandeep Gupta","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/66cf9f4aefb50dc9e77e6e8a/5g931tNvvu8-eGgMzhGVO.mp4"],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"EMMA: Extracting Multiple physical parameters from Multimodal Data","submittedOnDailyBy":{"_id":"66cf9f4aefb50dc9e77e6e8a","avatarUrl":"/avatars/e67a1c65ed52524d3ac76c7b862f37f5.svg","isPro":false,"fullname":"Ayan Banerjee","user":"abanerj3","type":"user","name":"abanerj3"},"summary":"We introduce EMMA, a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known initial conditions and coordinate frames, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model. EMMA leverages a Liquid Time-Constant (LTC) network to learn latent dynamics from heterogeneous modalities while a physics-constrained loss enforces consistency with the governing differential equations. A unified feature pipeline enables consistent alignment across video trajectories, acoustic signatures, and chart-derived measurements, allowing EMMA to estimate parameters under forced, implicit, and multivariate dynamics without requiring segmentation masks, differentiable rendering, or specialized sensors. Across 100+ scenarios including five standard dynamical benchmarks (75 Delfys videos), real-world rover and quadrotor systems with hidden inputs, and simulation-chart case studies spanning biological and chaotic systems, EMMA delivers robust multi-parameter recovery and significantly outperforms existing single-modality and equation-discovery baselines. Our results establish EMMA as a general, scalable solution for physics-consistent model extraction from opportunistic multimodal data. Code and data are available at: https://github.com/ImpactLabASU/EMMA-CVPR2026","upvotes":0,"discussionId":"6a244fb7e4c258a029491abf","projectPage":"https://impactlabasu.github.io/EMMA-CVPR2026/","githubRepo":"https://github.com/ImpactLabASU/EMMA-CVPR2026","githubRepoAddedBy":"user","ai_summary":"EMMA is a physics-informed multimodal framework that directly recovers dynamical parameters from raw video, audio, and image data using a Liquid Time-Constant network and physics-constrained loss.","ai_keywords":["Liquid Time-Constant","physics-constrained loss","multimodal framework","dynamical parameters","continuous-time model","latent dynamics","heterogeneous modalities","differential equations","unified feature pipeline","opportunistic multimodal data"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0,"organization":{"_id":"6a2431a61bcfde7f9d68436e","name":"ASU-IMPACT-Lab","fullname":"IMPACT Lab at Arizona State University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/66cf9f4aefb50dc9e77e6e8a/hPSqIZqACJvgzO_3w4fsA.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"6a2431a61bcfde7f9d68436e","name":"ASU-IMPACT-Lab","fullname":"IMPACT Lab at Arizona State University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/66cf9f4aefb50dc9e77e6e8a/hPSqIZqACJvgzO_3w4fsA.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.24047.md"}">

Papers

arxiv:2605.24047

EMMA: Extracting Multiple physical parameters from Multimodal Data

Published on May 21

· Submitted by

Ayan Banerjee on Jun 9

IMPACT Lab at Arizona State University

Upvote

Authors:

Ayan Banerjee ,

Abstract

EMMA is a physics-informed multimodal framework that directly recovers dynamical parameters from raw video, audio, and image data using a Liquid Time-Constant network and physics-constrained loss.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

We introduce EMMA, a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known initial conditions and coordinate frames, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model. EMMA leverages a Liquid Time-Constant (LTC) network to learn latent dynamics from heterogeneous modalities while a physics-constrained loss enforces consistency with the governing differential equations. A unified feature pipeline enables consistent alignment across video trajectories, acoustic signatures, and chart-derived measurements, allowing EMMA to estimate parameters under forced, implicit, and multivariate dynamics without requiring segmentation masks, differentiable rendering, or specialized sensors. Across 100+ scenarios including five standard dynamical benchmarks (75 Delfys videos), real-world rover and quadrotor systems with hidden inputs, and simulation-chart case studies spanning biological and chaotic systems, EMMA delivers robust multi-parameter recovery and significantly outperforms existing single-modality and equation-discovery baselines. Our results establish EMMA as a general, scalable solution for physics-consistent model extraction from opportunistic multimodal data. Code and data are available at: https://github.com/ImpactLabASU/EMMA-CVPR2026

View arXiv page View PDF Project page GitHub 0 Add to collection

Community

abanerj3

Paper author Paper submitter about 2 hours ago

•

edited about 2 hours ago

To the best of our knowledge this is the first work towards multi-modal physics parameter estimation. Published in CVPR 2026

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.24047

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.24047 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.24047 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

EMMA: Extracting Multiple physical parameters from Multimodal Data

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 1

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers