iPhone + open-source stack to create high quality data</p>\n","updatedAt":"2026-05-18T11:35:14.792Z","author":{"_id":"614efbb6ddd8df0d8bfd0a5a","avatarUrl":"/avatars/3188af4402df45c96c2d895a5ac388ec.svg","fullname":"Satpal Singh Rathore","name":"satpalsr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6699559688568115},"editors":["satpalsr"],"editorAvatarUrls":["/avatars/3188af4402df45c96c2d895a5ac388ec.svg"],"reactions":[],"isReport":false}},{"id":"6a0afc5bedca8351d95142ac","author":{"_id":"672e478e3e83663aafa442f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/M3xhhQrtXGw0L6mya1tfG.png","fullname":"Abhishek Anand","name":"abhishekanand94","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-05-18T11:47:39.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"the future of embodied ai is open source","html":"<p>the future of embodied ai is open source</p>\n","updatedAt":"2026-05-18T11:47:39.844Z","author":{"_id":"672e478e3e83663aafa442f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/M3xhhQrtXGw0L6mya1tfG.png","fullname":"Abhishek Anand","name":"abhishekanand94","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9784664511680603},"editors":["abhishekanand94"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/M3xhhQrtXGw0L6mya1tfG.png"],"reactions":[],"isReport":false}},{"id":"6a0bc16b7f332b13403e2237","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false},"createdAt":"2026-05-19T01:48:27.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks](https://huggingface.co/papers/2604.23570) (2026)\n* [SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation](https://huggingface.co/papers/2605.09613) (2026)\n* [RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild](https://huggingface.co/papers/2604.07331) (2026)\n* [EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World](https://huggingface.co/papers/2604.07607) (2026)\n* [Phone2Act: A Low-Cost, Hardware-Agnostic Teleoperation System for Scalable VLA Data Collection](https://huggingface.co/papers/2605.01948) (2026)\n* [UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos](https://huggingface.co/papers/2603.22264) (2026)\n* [HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps](https://huggingface.co/papers/2604.14944) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.23570\">EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.09613\">SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.07331\">RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.07607\">EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.01948\">Phone2Act: A Low-Cost, Hardware-Agnostic Teleoperation System for Scalable VLA Data Collection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2603.22264\">UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.14944\">HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-19T01:48:27.722Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6989216208457947},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.05945","authors":[{"_id":"6a02bd8cb823258e76123667","name":"Senthil Palanisamy","hidden":false},{"_id":"6a02bd8cb823258e76123668","name":"Abhishek Anand","hidden":false},{"_id":"6a02bd8cb823258e76123669","user":{"_id":"614efbb6ddd8df0d8bfd0a5a","avatarUrl":"/avatars/3188af4402df45c96c2d895a5ac388ec.svg","isPro":false,"fullname":"Satpal Singh Rathore","user":"satpalsr","type":"user","name":"satpalsr"},"name":"Satpal Singh Rathor","status":"claimed_verified","statusLastChangedAt":"2026-05-18T09:50:52.969Z","hidden":false},{"_id":"6a02bd8cb823258e7612366a","name":"Pratyush Patnaik","hidden":false},{"_id":"6a02bd8cb823258e7612366b","name":"Shubhanshu Khatana","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/614efbb6ddd8df0d8bfd0a5a/Pj8BfhaO7LTtQWVlHn2Z6.mp4"],"publishedAt":"2026-05-07T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware","submittedOnDailyBy":{"_id":"614efbb6ddd8df0d8bfd0a5a","avatarUrl":"/avatars/3188af4402df45c96c2d895a5ac388ec.svg","isPro":false,"fullname":"Satpal Singh Rathore","user":"satpalsr","type":"user","name":"satpalsr"},"summary":"The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scale egocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture the long horizon temporal dependencies necessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using commodity mobile hardware. We leverage the ubiquitous sensor suites of modern smartphones to provide high fidelity, long term camera pose tracking, effectively removing the high hardware barriers associated with traditional robotics data collection. Our contributions are three fold: (1) we release a novel dataset comprising 200 hours of diverse, long form egocentric data with persistent state tracking; (2) we open source a mobile application that enables any user to record egocentric data, and (3) we provide a comprehensive processing pipeline to convert raw mobile captures into standardized, training ready formats for Vision Language Action model and foundation model research. By democratizing the data collection process, this work enables the massive scale acquisition of long horizon data across varied global environments, accelerating the development of generalizable robotic policies.","upvotes":5,"discussionId":"6a02bd8db823258e7612366c","projectPage":"https://www.fpvlabs.ai/stera","githubRepo":"https://github.com/fpv-labs/stera-sdk","githubRepoAddedBy":"user","ai_summary":"A mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models.","ai_keywords":["Vision Language Action models","egocentric datasets","robot data collection","smartphone sensors","long horizon temporal dependencies","mobile hardware","data democratization"],"githubStars":9,"organization":{"_id":"68cd87c5063c34f576496e27","name":"fpvlabs","fullname":"FPV Labs","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/614efbb6ddd8df0d8bfd0a5a/skXOWmq6flwKczKuW4ehD.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"614efbb6ddd8df0d8bfd0a5a","avatarUrl":"/avatars/3188af4402df45c96c2d895a5ac388ec.svg","isPro":false,"fullname":"Satpal Singh Rathore","user":"satpalsr","type":"user"},{"_id":"6a0a8a642dc0b719e778aa66","avatarUrl":"/avatars/896953e554b0aa6c6847f58bd234309f.svg","isPro":false,"fullname":"Vishal","user":"tanwarVishal","type":"user"},{"_id":"6a09c543f391271df2123e41","avatarUrl":"/avatars/1b2a7c0fa197f8d76311d3bcd7b92706.svg","isPro":false,"fullname":"Shubham pandey","user":"Shubhamjjipandey","type":"user"},{"_id":"672e478e3e83663aafa442f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/M3xhhQrtXGw0L6mya1tfG.png","isPro":false,"fullname":"Abhishek Anand","user":"abhishekanand94","type":"user"},{"_id":"658c4287e15da20cd8907c4d","avatarUrl":"/avatars/19294758d2b2ef6393d67c874c2725ef.svg","isPro":false,"fullname":"Ekaksh Janweja","user":"stormej","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68cd87c5063c34f576496e27","name":"fpvlabs","fullname":"FPV Labs","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/614efbb6ddd8df0d8bfd0a5a/skXOWmq6flwKczKuW4ehD.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.05945.md"}">
MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware
Abstract
A mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models.
AI-generated summary
The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scale egocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture the long horizon temporal dependencies necessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using commodity mobile hardware. We leverage the ubiquitous sensor suites of modern smartphones to provide high fidelity, long term camera pose tracking, effectively removing the high hardware barriers associated with traditional robotics data collection. Our contributions are three fold: (1) we release a novel dataset comprising 200 hours of diverse, long form egocentric data with persistent state tracking; (2) we open source a mobile application that enables any user to record egocentric data, and (3) we provide a comprehensive processing pipeline to convert raw mobile captures into standardized, training ready formats for Vision Language Action model and foundation model research. By democratizing the data collection process, this work enables the massive scale acquisition of long horizon data across varied global environments, accelerating the development of generalizable robotic policies.
Community
iPhone + open-source stack to create high quality data
the future of embodied ai is open source
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.05945 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.05945 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.