AnyMo is a geometry-aware, setup-agnostic framework for wearable motion understanding. It learns transferable IMU motion representations across sensing setups and datasets, connects sparse wearable signals to open-vocabulary recognition, cross-modal retrieval, and motion captioning, and contributes a physics-grounded synthetic IMU simulation pipeline, the AnyMo-180 label resource, and AnyMo Bench for fine-grained in-the-wild HAR evaluation.</p>\n","updatedAt":"2026-05-22T14:45:44.694Z","author":{"_id":"67b5890b0d878eff1a36ce8d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Py6177rLRA6KQtafyAZ90.jpeg","fullname":"Breeze Chen","name":"Breezelled","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8172266483306885},"editors":["Breezelled"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Py6177rLRA6KQtafyAZ90.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.22715","authors":[{"_id":"6a1069ae4cd7a376798ea234","user":{"_id":"67b5890b0d878eff1a36ce8d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Py6177rLRA6KQtafyAZ90.jpeg","isPro":false,"fullname":"Breeze Chen","user":"Breezelled","type":"user","name":"Breezelled"},"name":"Baiyu Chen","status":"claimed_verified","statusLastChangedAt":"2026-05-22T15:58:53.611Z","hidden":false},{"_id":"6a1069ae4cd7a376798ea235","name":"Zechen Li","hidden":false},{"_id":"6a1069ae4cd7a376798ea236","name":"Wilson Wongso","hidden":false},{"_id":"6a1069ae4cd7a376798ea237","name":"Lihuan Li","hidden":false},{"_id":"6a1069ae4cd7a376798ea238","name":"Xiachong Lin","hidden":false},{"_id":"6a1069ae4cd7a376798ea239","name":"Hao Xue","hidden":false},{"_id":"6a1069ae4cd7a376798ea23a","name":"Benjamin Tag","hidden":false},{"_id":"6a1069ae4cd7a376798ea23b","name":"Flora Salim","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/67b5890b0d878eff1a36ce8d/Yh64j846f_BY3WBUe8_A7.png","https://cdn-uploads.huggingface.co/production/uploads/67b5890b0d878eff1a36ce8d/7AFCIEasee-ZC8BS8Dwah.png","https://cdn-uploads.huggingface.co/production/uploads/67b5890b0d878eff1a36ce8d/jR1hWN_mGmdIZn77cwbm9.png","https://cdn-uploads.huggingface.co/production/uploads/67b5890b0d878eff1a36ce8d/oxhmZPpmQ299i1r_tIuOZ.png"],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild","submittedOnDailyBy":{"_id":"67b5890b0d878eff1a36ce8d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Py6177rLRA6KQtafyAZ90.jpeg","isPro":false,"fullname":"Breeze Chen","user":"Breezelled","type":"user","name":"Breezelled"},"summary":"As wearable and mobile devices become increasingly embedded in daily life, they offer a practical way to continuously sense human motion in the wild. But inertial signals are highly dependent on the sensing setup, including body location, mounting position, sensor orientation, device hardware, and sampling protocol. This setup dependence makes it difficult to learn motion representations that transfer across devices and datasets, and limits the broader use of wearable IMUs beyond closed-set recognition. We introduce AnyMo, a geometry-aware framework for setup-agnostic human motion modeling. AnyMo uses physics-grounded IMU simulation over dense body-surface placements to generate diverse and plausible synthetic signals, pre-trains a graph encoder from paired synthetic placement views and masked partial observations, tokenizes multi-position IMU into full-body motion tokens, and aligns these tokens with an LLM for motion-language understanding. We evaluate AnyMo on three complementary tasks: zero-shot activity recognition across 14 unseen downstream datasets, cross-modal retrieval, and wearable IMU motion captioning, where it improves average Accuracy/F1/R@2 by 11.7\\%/11.6\\%/22.6\\% on HAR, increases zero-shot IMU-to-text and text-to-IMU retrieval MRR by 15.9\\% and 28.6\\%, respectively, and improves zero-shot captioning BERT-F1 by 18.8\\%. These results support AnyMo as a generalist model for wearable motion understanding in the wild. Project page: https://baiyuchen.com/project/AnyMo.","upvotes":1,"discussionId":"6a1069af4cd7a376798ea23c","projectPage":"https://baiyuchen.com/project/AnyMo","githubRepo":"https://github.com/Breezelled/AnyMo","githubRepoAddedBy":"user","ai_summary":"AnyMo is a geometry-aware framework that enables setup-agnostic human motion modeling using physics-grounded IMU simulation and graph encoding for cross-dataset activity recognition and cross-modal retrieval.","ai_keywords":["geometry-aware framework","physics-grounded IMU simulation","graph encoder","masked partial observations","tokenization","multi-position IMU","full-body motion tokens","LLM","motion-language understanding","zero-shot activity recognition","cross-modal retrieval","wearable IMU motion captioning"],"githubStars":0,"organization":{"_id":"6827134b651ba864d44771f1","name":"CRUISEResearchGroup","fullname":"CRUISE Research Group (UNSW)","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fe2ece4f77228622cccfb44/6VS2WnXDG6PexXWzF3RXn.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67b5890b0d878eff1a36ce8d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Py6177rLRA6KQtafyAZ90.jpeg","isPro":false,"fullname":"Breeze Chen","user":"Breezelled","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6827134b651ba864d44771f1","name":"CRUISEResearchGroup","fullname":"CRUISE Research Group (UNSW)","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fe2ece4f77228622cccfb44/6VS2WnXDG6PexXWzF3RXn.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.22715.md"}">
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild
Abstract
AnyMo is a geometry-aware framework that enables setup-agnostic human motion modeling using physics-grounded IMU simulation and graph encoding for cross-dataset activity recognition and cross-modal retrieval.
AI-generated summary
As wearable and mobile devices become increasingly embedded in daily life, they offer a practical way to continuously sense human motion in the wild. But inertial signals are highly dependent on the sensing setup, including body location, mounting position, sensor orientation, device hardware, and sampling protocol. This setup dependence makes it difficult to learn motion representations that transfer across devices and datasets, and limits the broader use of wearable IMUs beyond closed-set recognition. We introduce AnyMo, a geometry-aware framework for setup-agnostic human motion modeling. AnyMo uses physics-grounded IMU simulation over dense body-surface placements to generate diverse and plausible synthetic signals, pre-trains a graph encoder from paired synthetic placement views and masked partial observations, tokenizes multi-position IMU into full-body motion tokens, and aligns these tokens with an LLM for motion-language understanding. We evaluate AnyMo on three complementary tasks: zero-shot activity recognition across 14 unseen downstream datasets, cross-modal retrieval, and wearable IMU motion captioning, where it improves average Accuracy/F1/R@2 by 11.7\%/11.6\%/22.6\% on HAR, increases zero-shot IMU-to-text and text-to-IMU retrieval MRR by 15.9\% and 28.6\%, respectively, and improves zero-shot captioning BERT-F1 by 18.8\%. These results support AnyMo as a generalist model for wearable motion understanding in the wild. Project page: https://baiyuchen.com/project/AnyMo.
Community
AnyMo is a geometry-aware, setup-agnostic framework for wearable motion understanding. It learns transferable IMU motion representations across sensing setups and datasets, connects sparse wearable signals to open-vocabulary recognition, cross-modal retrieval, and motion captioning, and contributes a physics-grounded synthetic IMU simulation pipeline, the AnyMo-180 label resource, and AnyMo Bench for fine-grained in-the-wild HAR evaluation.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.22715 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.22715 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.22715 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.