Hugging Face Daily Papers · · 4 min read

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Accepted by CVPR 2026</p>\n","updatedAt":"2026-05-22T02:13:39.495Z","author":{"_id":"64cbf523e3cc4a476d8291b6","avatarUrl":"/avatars/825d7665db471e46921abad3319c2846.svg","fullname":"Jiahao Wang","name":"jiahaoplus","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9610432982444763},"editors":["jiahaoplus"],"editorAvatarUrls":["/avatars/825d7665db471e46921abad3319c2846.svg"],"reactions":[],"isReport":false}},{"id":"6a0fbf72bccfaf1e5a1bbcc6","author":{"_id":"64cbf523e3cc4a476d8291b6","avatarUrl":"/avatars/825d7665db471e46921abad3319c2846.svg","fullname":"Jiahao Wang","name":"jiahaoplus","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2026-05-22T02:29:06.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Organizations: Waymo, Johns Hopkins University, Google DeepMind, University of Washington","html":"<p>Organizations: Waymo, Johns Hopkins University, Google DeepMind, University of Washington</p>\n","updatedAt":"2026-05-22T02:29:06.294Z","author":{"_id":"64cbf523e3cc4a476d8291b6","avatarUrl":"/avatars/825d7665db471e46921abad3319c2846.svg","fullname":"Jiahao Wang","name":"jiahaoplus","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6423346996307373},"editors":["jiahaoplus"],"editorAvatarUrls":["/avatars/825d7665db471e46921abad3319c2846.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.22809","authors":[{"_id":"6a0fbb7aa53a61ce2e422c64","name":"Jiahao Wang","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c65","name":"Bo Sun","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c66","name":"Yijing Bai","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c67","name":"Vincent Casser","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c68","name":"Songyou Peng","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c69","name":"Zehao Zhu","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c6a","name":"Meng-Li Shih","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c6b","name":"Xander Masotto","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c6c","name":"Shih-Yang Su","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c6d","name":"Kanaad V Parvate","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c6e","name":"Tiancheng Ge","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c6f","name":"Linn Bieske","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c70","name":"Dragomir Anguelov","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c71","name":"Mingxing Tan","hidden":false},{"_id":"6a0fbb7aa53a61ce2e422c72","name":"Chiyu Max Jiang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/64cbf523e3cc4a476d8291b6/yJzxPmGe0hL-25GRkJas-.png"],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving","submittedOnDailyBy":{"_id":"64cbf523e3cc4a476d8291b6","avatarUrl":"/avatars/825d7665db471e46921abad3319c2846.svg","isPro":false,"fullname":"Jiahao Wang","user":"jiahaoplus","type":"user","name":"jiahaoplus"},"summary":"Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for validation and training. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite (AV logs) comprising multi-view camera images and LiDAR point clouds. A core challenge is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform comprehensive quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.","upvotes":16,"discussionId":"6a0fbb7aa53a61ce2e422c73","ai_summary":"Sensor2Sensor generates high-fidelity multi-modal sensor data from in-the-wild dashcam videos using diffusion models and 4D Gaussian Splatting for autonomous driving system training and validation.","ai_keywords":["generative modeling","diffusion architecture","4D Gaussian Splatting","multi-modal sensor suite","sensor data generation","autonomous driving systems","in-the-wild video data","multi-view camera images","LiDAR point clouds"],"organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64cbf523e3cc4a476d8291b6","avatarUrl":"/avatars/825d7665db471e46921abad3319c2846.svg","isPro":false,"fullname":"Jiahao Wang","user":"jiahaoplus","type":"user"},{"_id":"68269b60cfb51f71804f6823","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/MswDDo5zMxMDIqUZtDuIt.png","isPro":false,"fullname":"researchinfant","user":"researchinfant","type":"user"},{"_id":"669b20f847606a4c9dbfd8a7","avatarUrl":"/avatars/4ba5aa3b026051c4959498d8b7e22c69.svg","isPro":false,"fullname":"Cindy Dong","user":"Cindy-Dong","type":"user"},{"_id":"6466d463060756d2854ab3e1","avatarUrl":"/avatars/4401387180c16472a6823f78aaa86d54.svg","isPro":false,"fullname":"Chenyu You","user":"Charlesyooo","type":"user"},{"_id":"6686d5271c9e513b683c8ea9","avatarUrl":"/avatars/f18bfc5146cd6d345d21e04d251630f8.svg","isPro":false,"fullname":"alex","user":"as7d","type":"user"},{"_id":"64d8ab7a887f55fb6e5629da","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d8ab7a887f55fb6e5629da/KKr2_-pMUNfqTSgWlD9ZO.jpeg","isPro":true,"fullname":"Xingrui Wang","user":"RyanWW","type":"user"},{"_id":"6a0fbfdf2ac3ae1bf767ea72","avatarUrl":"/avatars/c1bbd69f76fca7f058eae610f6e003e9.svg","isPro":false,"fullname":"Andy Liu","user":"QQNCCLHUGG","type":"user"},{"_id":"65bb837dbfb878f46c77de4c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65bb837dbfb878f46c77de4c/23gZ_lBEwyoqjexFy9QLD.jpeg","isPro":true,"fullname":"Prithiv Sakthi","user":"prithivMLmods","type":"user"},{"_id":"6a0fd9ec2b01902a23624ff0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0Q02OnMHzQLKe6taGp5MD.png","isPro":false,"fullname":"Jay","user":"kp0616","type":"user"},{"_id":"6a0fdaf20a16b2c712315d94","avatarUrl":"/avatars/78dcd59505be2302526345aa65cb5769.svg","isPro":false,"fullname":"Alex Smith","user":"AlexLSmith","type":"user"},{"_id":"6a0fdc282ac3ae1bf769b2a4","avatarUrl":"/avatars/a7da2a0a0077a8c6f54106e7de7f89bb.svg","isPro":false,"fullname":"James Krieger","user":"jameskrieger","type":"user"},{"_id":"69ccf171be44414ace4df2db","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/tfSoQatYvAMQMu_sHVc1A.png","isPro":false,"fullname":"이민서","user":"jacksonmartin51","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.22809.md"}">
Papers
arxiv:2605.22809

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Published on May 21
· Submitted by
Jiahao Wang
on May 22
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Sensor2Sensor generates high-fidelity multi-modal sensor data from in-the-wild dashcam videos using diffusion models and 4D Gaussian Splatting for autonomous driving system training and validation.

AI-generated summary

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for validation and training. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite (AV logs) comprising multi-view camera images and LiDAR point clouds. A core challenge is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform comprehensive quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.

Community

Paper submitter about 10 hours ago

Accepted by CVPR 2026

Paper submitter about 10 hours ago

Organizations: Waymo, Johns Hopkins University, Google DeepMind, University of Washington

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.22809
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.22809 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.22809 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.22809 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers