Hugging Face Daily Papers · May 18, 2026 · 6 min read

MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

iPhone + open-source stack to create high quality data\n","updatedAt":"2026-05-18T11:35:14.792Z","author":{"_id":"614efbb6ddd8df0d8bfd0a5a","avatarUrl":"/avatars/3188af4402df45c96c2d895a5ac388ec.svg","fullname":"Satpal Singh Rathore","name":"satpalsr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6699559688568115},"editors":["satpalsr"],"editorAvatarUrls":["/avatars/3188af4402df45c96c2d895a5ac388ec.svg"],"reactions":[],"isReport":false}},{"id":"6a0afc5bedca8351d95142ac","author":{"_id":"672e478e3e83663aafa442f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/M3xhhQrtXGw0L6mya1tfG.png","fullname":"Abhishek Anand","name":"abhishekanand94","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-05-18T11:47:39.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"the future of embodied ai is open source","html":"the future of embodied ai is open source\n","updatedAt":"2026-05-18T11:47:39.844Z","author":{"_id":"672e478e3e83663aafa442f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/M3xhhQrtXGw0L6mya1tfG.png","fullname":"Abhishek Anand","name":"abhishekanand94","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9784664511680603},"editors":["abhishekanand94"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/M3xhhQrtXGw0L6mya1tfG.png"],"reactions":[],"isReport":false}},{"id":"6a0bc16b7f332b13403e2237","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false},"createdAt":"2026-05-19T01:48:27.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks](https://huggingface.co/papers/2604.23570) (2026)\n* [SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation](https://huggingface.co/papers/2605.09613) (2026)\n* [RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild](https://huggingface.co/papers/2604.07331) (2026)\n* [EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World](https://huggingface.co/papers/2604.07607) (2026)\n* [Phone2Act: A Low-Cost, Hardware-Agnostic Teleoperation System for Scalable VLA Data Collection](https://huggingface.co/papers/2605.01948) (2026)\n* [UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos](https://huggingface.co/papers/2603.22264) (2026)\n* [HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps](https://huggingface.co/papers/2604.14944) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.23570\">EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.09613\">SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.07331\">RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.07607\">EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.01948\">Phone2Act: A Low-Cost, Hardware-Agnostic Teleoperation System for Scalable VLA Data Collection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2603.22264\">UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.14944\">HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-05-19T01:48:27.722Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6989216208457947},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.05945","authors":[{"_id":"6a02bd8cb823258e76123667","name":"Senthil Palanisamy","hidden":false},{"_id":"6a02bd8cb823258e76123668","name":"Abhishek Anand","hidden":false},{"_id":"6a02bd8cb823258e76123669","user":{"_id":"614efbb6ddd8df0d8bfd0a5a","avatarUrl":"/avatars/3188af4402df45c96c2d895a5ac388ec.svg","isPro":false,"fullname":"Satpal Singh Rathore","user":"satpalsr","type":"user","name":"satpalsr"},"name":"Satpal Singh Rathor","status":"claimed_verified","statusLastChangedAt":"2026-05-18T09:50:52.969Z","hidden":false},{"_id":"6a02bd8cb823258e7612366a","name":"Pratyush Patnaik","hidden":false},{"_id":"6a02bd8cb823258e7612366b","name":"Shubhanshu Khatana","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/614efbb6ddd8df0d8bfd0a5a/Pj8BfhaO7LTtQWVlHn2Z6.mp4"],"publishedAt":"2026-05-07T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware","submittedOnDailyBy":{"_id":"614efbb6ddd8df0d8bfd0a5a","avatarUrl":"/avatars/3188af4402df45c96c2d895a5ac388ec.svg","isPro":false,"fullname":"Satpal Singh Rathore","user":"satpalsr","type":"user","name":"satpalsr"},"summary":"The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scale egocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture the long horizon temporal dependencies necessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using commodity mobile hardware. We leverage the ubiquitous sensor suites of modern smartphones to provide high fidelity, long term camera pose tracking, effectively removing the high hardware barriers associated with traditional robotics data collection. Our contributions are three fold: (1) we release a novel dataset comprising 200 hours of diverse, long form egocentric data with persistent state tracking; (2) we open source a mobile application that enables any user to record egocentric data, and (3) we provide a comprehensive processing pipeline to convert raw mobile captures into standardized, training ready formats for Vision Language Action model and foundation model research. By democratizing the data collection process, this work enables the massive scale acquisition of long horizon data across varied global environments, accelerating the development of generalizable robotic policies.","upvotes":5,"discussionId":"6a02bd8db823258e7612366c","projectPage":"https://www.fpvlabs.ai/stera","githubRepo":"https://github.com/fpv-labs/stera-sdk","githubRepoAddedBy":"user","ai_summary":"A mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models.","ai_keywords":["Vision Language Action models","egocentric datasets","robot data collection","smartphone sensors","long horizon temporal dependencies","mobile hardware","data democratization"],"githubStars":9,"organization":{"_id":"68cd87c5063c34f576496e27","name":"fpvlabs","fullname":"FPV Labs","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/614efbb6ddd8df0d8bfd0a5a/skXOWmq6flwKczKuW4ehD.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"614efbb6ddd8df0d8bfd0a5a","avatarUrl":"/avatars/3188af4402df45c96c2d895a5ac388ec.svg","isPro":false,"fullname":"Satpal Singh Rathore","user":"satpalsr","type":"user"},{"_id":"6a0a8a642dc0b719e778aa66","avatarUrl":"/avatars/896953e554b0aa6c6847f58bd234309f.svg","isPro":false,"fullname":"Vishal","user":"tanwarVishal","type":"user"},{"_id":"6a09c543f391271df2123e41","avatarUrl":"/avatars/1b2a7c0fa197f8d76311d3bcd7b92706.svg","isPro":false,"fullname":"Shubham pandey","user":"Shubhamjjipandey","type":"user"},{"_id":"672e478e3e83663aafa442f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/M3xhhQrtXGw0L6mya1tfG.png","isPro":false,"fullname":"Abhishek Anand","user":"abhishekanand94","type":"user"},{"_id":"658c4287e15da20cd8907c4d","avatarUrl":"/avatars/19294758d2b2ef6393d67c874c2725ef.svg","isPro":false,"fullname":"Ekaksh Janweja","user":"stormej","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68cd87c5063c34f576496e27","name":"fpvlabs","fullname":"FPV Labs","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/614efbb6ddd8df0d8bfd0a5a/skXOWmq6flwKczKuW4ehD.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.05945.md"}">

Papers

arxiv:2605.05945

MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware

Published on May 7

· Submitted by

Satpal Singh Rathore on May 18

FPV Labs

Upvote

Authors:

Satpal Singh Rathor ,

Abstract

A mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models.

AI-generated summary

The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scale egocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture the long horizon temporal dependencies necessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using commodity mobile hardware. We leverage the ubiquitous sensor suites of modern smartphones to provide high fidelity, long term camera pose tracking, effectively removing the high hardware barriers associated with traditional robotics data collection. Our contributions are three fold: (1) we release a novel dataset comprising 200 hours of diverse, long form egocentric data with persistent state tracking; (2) we open source a mobile application that enables any user to record egocentric data, and (3) we provide a comprehensive processing pipeline to convert raw mobile captures into standardized, training ready formats for Vision Language Action model and foundation model research. By democratizing the data collection process, this work enables the massive scale acquisition of long horizon data across varied global environments, accelerating the development of generalizable robotic policies.

View arXiv page View PDF Project page GitHub 9 Add to collection