News / #robotics Tag Robotics 184 articles archived under #robotics · RSS Sign in to follow r/MachineLearning community 24d ago I'm looking to join/form a team working on physical AI robotics challenge [P] Hey all, I'm a robotics engineer by training turned ML/AI engineer because of passion right after school. I want to start combining these skills together and I think a competition is the best way of doing it. Here's an example of a challenge I'm talking about to set expectations… 18 Hugging Face Daily Papers research 24d ago World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis Abstract World-language-action models combine textual instruction processing with robot state prediction through an autoregressive transformer backbone, enabling efficient long-horizon task execution and cross-embodiment learning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We… 7 Hugging Face Daily Papers research 24d ago Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation? Abstract Video generation models were evaluated through robotic manipulation tasks to assess their ability to reflect physical reality, revealing that visual quality does not predict executable motion accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generation models… 20 Hugging Face Daily Papers research 24d ago SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction Abstract A compression framework for cloud robotics combines learned latent representations with standard JPEG compatibility to achieve faster encoding and decoding while maintaining high perceptual quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In robotics systems, vast… 31 r/MachineLearning community 24d ago Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R] It seems raw teleoperation data (RGB + joint states) structurally lacks affordance, contact intent, and embodiment-specific kinematic context. (information that can't be reliably recovered post-hoc once the demonstration is recorded) Most current approaches either filter/clean… 11 Hugging Face Daily Papers research 25d ago RobotValues: Evaluating Household Robots When Human Values Conflict Abstract RobotValues benchmark evaluates household robot planners in value-conflict scenarios, revealing that vision-language models exhibit default value preferences and struggle to override them when instructed to prioritize conflicting values. Generated by… 8 arXiv — Machine Learning research 25d ago Flash-WAM: Modality-Aware Distillation for World Action Models arXiv:2606.05254v1 Announce Type: new Abstract: World-action models (WAMs) jointly generate future video and robot actions through iterative diffusion, achieving strong performance on manipulation benchmarks but requiring tens of denoising steps, a cost that precludes real-time… 13 arXiv — Machine Learning research 25d ago What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning arXiv:2606.05533v1 Announce Type: new Abstract: Existing robot planning systems rely on appearance-based reasoning, where visual observations are encoded into latent spaces organized around object appearances (e.g., recognizing a "cart" based on how it looks). However, planning… 13 Ars Technica — AI news-outlet 25d ago The skeptic’s guide to humanoid robots going viral on the Internet Robot demonstrations can distort public perceptions of robotic capabilities. 9 Dwarkesh Podcast news-outlet 25d ago Alex Imas and Phil Trammell – What remains scarce after AGI? “One robot now turns into many robots next year, but the number of ballerinas is the same.” 37 TechCrunch — AI news-outlet 25d ago Is Silicon Valley ready to put robots in people’s homes? Hello Robot is. The California startup released the fourth-generation of its home assistance robot, Stretch. 30 Hugging Face Daily Papers research 25d ago PaintBench: Deterministic Evaluation of Precise Visual Editing Abstract PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While current… 12 Hugging Face Daily Papers research 25d ago Cosmos 3: Omnimodal World Models for Physical AI Abstract Cosmos 3 is an omnimodal world model that processes and generates multiple data types through a unified mixture-of-transformers architecture, achieving state-of-the-art performance in various understanding and generation tasks. Generated by… 38 Hugging Face Daily Papers research 26d ago OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs Abstract OVO-S-Bench presents a comprehensive benchmark for evaluating streaming spatial intelligence in multimodal language models through human-annotated questions spanning multiple abstraction levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal agents in robotics,… 23 arXiv — NLP / Computation & Language research 26d ago Hybrid Adversarial Defence for Natural Language Understanding Tasks arXiv:2606.04612v1 Announce Type: new Abstract: Large Language Models (LLMs) are vulnerable both to hallucination and adversarial manipulation. Although these problems are closely related, existing defences typically address them separately. We investigate a hybrid defence… 21 arXiv — NLP / Computation & Language research 26d ago Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation arXiv:2606.04046v1 Announce Type: cross Abstract: In embodied vision-language decision making tasks such as robotic manipulation and navigation, Vision-Language and Vision-Language-Action Models (VLMs & VLAs) are powerful tools with different benefits: VLMs are better at… 31 Hugging Face Daily Papers research 26d ago GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors Abstract GRAIL generates diverse humanoid manipulation and locomotion data through 3D asset composition and video foundation models, enabling effective sim-to-real transfer for robot control. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling humanoid loco-manipulation… 9 Hugging Face Daily Papers research 26d ago AURA: Action-Gated Memory for Robot Policies at Constant VRAM Abstract AURA-Mem is a recurrent memory system that adapts to embodied AI constraints by writing only when observations affect actions, significantly reducing memory writes compared to traditional KV-cache approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The KV-cache is… 7 Hugging Face Daily Papers research 26d ago Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking Abstract Humanoid-GPT is a GPT-style Transformer with causal attention trained on a billion-scale motion corpus that achieves zero-shot generalization to unseen motions and control tasks through scalable pre-training on diverse motion data. Generated by… 29 Hugging Face Daily Papers research 27d ago AFUN: Towards an Affordance Foundation Model for Functionality Understanding Abstract Affordance understanding model predicts functional masks and 3D motion curves from RGB-D observations and language descriptions, enabling generalizable robot manipulation across diverse environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Affordance understanding… 36 Hugging Face Daily Papers research 27d ago τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation Abstract A unified video-action world model integrates policy learning, video prediction, and action evaluation using a shared video diffusion backbone for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic manipulation requires models that generate… 22 Hugging Face Daily Papers research 27d ago Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems Abstract Physical AI systems face safety challenges where black-box models can execute harmful actions without detection, necessitating comprehensive runtime guardrail mechanisms for safe operation. AI-generated summary Physical AI systems increasingly map multimodal… 12 Hugging Face Daily Papers research 27d ago Can Predicted Dynamics Exist in the Physical World? Abstract Physical admissibility validation for AI systems uses prediction-control interfaces with kinematic and dynamic conditions to filter invalid proposals while maintaining high performance. AI-generated summary Predictive Physical AI systems output state rollouts, action… 33 arXiv — Machine Learning research 28d ago From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models arXiv:2606.00083v1 Announce Type: new Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained… 11 arXiv — NLP / Computation & Language research 28d ago DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation arXiv:2606.01212v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) systems are widely deployed and increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely… 23 Hugging Face Daily Papers research 28d ago RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models Abstract RoboSemanticBench identifies a disconnect between semantic understanding and action prediction in vision-language-action models, where robots can grasp objects but fail to select semantically correct targets. AI-generated summary Vision-language-action (VLA) models are… 15 Hugging Face Daily Papers research 28d ago RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes Abstract RoboStressBench presents a principled benchmark for evaluating vision-language model robustness to physical visual stress in embodied AI, decomposing visual stress into material, viewpoint, lighting, and geometry dimensions. AI-generated summary Vision-Language Models… 4 r/LocalLLaMA community 28d ago NVIDIA GB300 Grace Blackwell Ultra pricetags https://www.scan.co.uk/shop/ai-and-robotics/workstations-ai/nvidia-dgx-station   submitted by   /u/X-N2O [link]   [comments] 5 Ars Technica — AI news-outlet 28d ago Allegedly trashing Airbnbs to test robots puts startup in legal trouble Lawsuit seeks $12,000 from startup that allegedly damaged home in robot tests. 28 Hugging Face Daily Papers research 28d ago Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode Abstract Batch-1 autoregressive decoding in physical AI systems shows that memory bandwidth alone doesn't fully explain latency, with GPU speedup limited by launch overheads and quantization efficiency varying significantly across hardware platforms. AI-generated summary… 16 r/LocalLLaMA community 28d ago How to build a shitty robot   submitted by   /u/badlogicgames [link]   [comments] 35 Hugging Face official-blog 29d ago Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action Back to Articles Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action Enterprise + Article Published June 1, 2026 Upvote - Asawaree asawareeb nvidia Atharva Joshi atharvajoshi10 nvidia NVIDIA Cosmos 3 is here - and it's available on Hugging… 23 NVIDIA Developer Blog official-blog 29d ago Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3 Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what's... 21 Hugging Face Daily Papers research 29d ago Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal Abstract Frequency Guidance Operator enables smooth action generation in diffusion policies by steering noisy samples through intermediate sub-frequency manifolds, improving robotic manipulation performance. AI-generated summary Learning visuomotor policies via behavior cloning… 11 arXiv — NLP / Computation & Language research 29d ago Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely arXiv:2605.31387v1 Announce Type: new Abstract: Robots operating in diverse environments rely on visual input to interpret objects and spatial layouts. In human-collaborative tasks, they are expected to communicate this understanding through language. Vision-language models… 32 Hugging Face Daily Papers research 29d ago Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring Abstract Hide-and-Seek framework detects robot execution failures in vision-language-action models by localizing failure-indicative actions through contrastive learning from trajectory-level supervision without step-level annotations. AI-generated summary Vision-Language-Action… 18 r/MachineLearning community 1mo ago Before we spend months processing open-source robotics datasets, tell us why this is a bad idea [D] Ps. Not pitching anything; Just trying to understand where reality differs from the narrative. We're a couple of ML students, mostly worked on ML/software before, but over the last few months we've been playing with VLAs, robot datasets, and trying to understand where the field… 27 Hugging Face Daily Papers research 1mo ago DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation Abstract DynaFLIP is a dynamics-aware multimodal pre-training framework that enhances robot manipulation by integrating motion understanding into visual perception through image-language-3D flow triplets and geometric regularization techniques. AI-generated summary Robot… 22 Ars Technica — AI news-outlet 1mo ago Startup offers free home cleaning—if it can record it all for robot training The latest twist in paying humans to wear head cameras for robot training data. 26 Hugging Face Daily Papers research 1mo ago Reducing Political Manipulation with Consistency Training Abstract Large language models demonstrate systematic political bias in handling opposing viewpoints, which can be mitigated through a reinforcement learning approach that maintains helpfulness while reducing bias. AI-generated summary Large language models (LLMs) exhibit… 18 Hugging Face Daily Papers research 1mo ago Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Abstract A unified vision-language-action model is presented that integrates diverse embodied decision-making tasks through a shared architecture and training approach, demonstrating strong performance across manipulation, navigation, and trajectory prediction with… 31 r/MachineLearning community 1mo ago Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D] Wall-OSS-0.5 is a new 4B VLA release from X Square Robot, built on a 3B VLM backbone with action experts in a Mixture-of-Transformers layout. What caught my eye is that the report evaluates the pretrained checkpoint on real robots before task-specific fine tuning, instead of… 25 Hugging Face Daily Papers research 1mo ago Rethinking VLM Representation for VLA Initialization Abstract Effective vision-language-action model initialization requires balancing pretrained vision-language model representations with embodied task-specific adaptations and robot-data pretraining while preserving core action-relevant features. AI-generated summary… 22 Hugging Face Daily Papers research 1mo ago Learning High-Frequency Continuous Action Chunks in Latent Space Abstract High-frequency robotic control is improved by using variational autoencoders to enhance temporal and spatial consistency, combined with a reuse-then-refine strategy for smooth real-time execution. AI-generated summary Modern robotic policies increasingly rely on action… 26 Ars Technica — AI news-outlet 1mo ago 3D-printable humanoid legs let robotics experiments run wild Hugging Face debuts $2,500 bipedal robot project for builders and researchers. 33 TechCrunch — AI news-outlet 1mo ago This startup is betting India’s gig economy can train the world’s robots Human Archive, a startup founded by Berkeley and Stanford researchers, is paying gig workers in India to wear camera-equipped caps and sensor devices to collect the real-world physical training data that AI and robotics labs are racing to acquire. 34 arXiv — NLP / Computation & Language research 1mo ago GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving arXiv:2605.25384v1 Announce Type: new Abstract: Mathematical reasoning is a hallmark of human intelligence, requiring logical deduction, symbolic manipulation, and abstract thinking. Recent multimodal large language models (MLLMs) have demonstrated strong performance on geometry… 22 arXiv — Machine Learning research 1mo ago Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity arXiv:2605.22871v1 Announce Type: new Abstract: Machine unlearning is a fundamental mechanism that enforces the right to be forgotten. Existing unlearning studies that rely on label manipulation or task-gradient reversal often deliver limited unlearning effectiveness. Moreover,… 38 arXiv — Machine Learning research 1mo ago Sample-wise Targeted Adversarial Attacks on Test-time Adaptation arXiv:2605.23411v1 Announce Type: new Abstract: Test-time adaptation (TTA) effectively counters distribution shifts but exposes models to adversarial manipulation via the unlabeled test stream. Existing class-wise targeted attacks remain impractical for stealthy exploitation in… 12 arXiv — NLP / Computation & Language research 1mo ago Autonomous Frontier-Based Exploration with VLM Guidance arXiv:2605.23165v1 Announce Type: cross Abstract: Autonomous robotic exploration of unknown and hazardous environments, a long-standing challenge, can be significantly improved by leveraging the advanced reasoning of Vision-Language Models (VLMs). We introduce a novel… 23 Page 3 of 4 · 184 articles ← Newer Older →