Tag

Robotics

184 articles archived under #robotics · RSS

r/MachineLearning community 24d ago

I'm looking to join/form a team working on physical AI robotics challenge [P]

Hey all, I'm a robotics engineer by training turned ML/AI engineer because of passion right after school. I want to start combining these skills together and I think a competition is the best way of doing it. Here's an example of a challenge I'm talking about to set expectations…

18
Hugging Face Daily Papers research 24d ago

World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis

Abstract World-language-action models combine textual instruction processing with robot state prediction through an autoregressive transformer backbone, enabling efficient long-horizon task execution and cross-embodiment learning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…

7
Hugging Face Daily Papers research 24d ago

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

Abstract Video generation models were evaluated through robotic manipulation tasks to assess their ability to reflect physical reality, revealing that visual quality does not predict executable motion accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generation models…

20
Hugging Face Daily Papers research 24d ago

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

Abstract A compression framework for cloud robotics combines learned latent representations with standard JPEG compatibility to achieve faster encoding and decoding while maintaining high perceptual quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In robotics systems, vast…

31
r/MachineLearning community 24d ago

Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R]

It seems raw teleoperation data (RGB + joint states) structurally lacks affordance, contact intent, and embodiment-specific kinematic context. (information that can't be reliably recovered post-hoc once the demonstration is recorded) Most current approaches either filter/clean…

11
Hugging Face Daily Papers research 25d ago

RobotValues: Evaluating Household Robots When Human Values Conflict

Abstract RobotValues benchmark evaluates household robot planners in value-conflict scenarios, revealing that vision-language models exhibit default value preferences and struggle to override them when instructed to prioritize conflicting values. Generated by…

8
arXiv — Machine Learning research 25d ago

Flash-WAM: Modality-Aware Distillation for World Action Models

arXiv:2606.05254v1 Announce Type: new Abstract: World-action models (WAMs) jointly generate future video and robot actions through iterative diffusion, achieving strong performance on manipulation benchmarks but requiring tens of denoising steps, a cost that precludes real-time…

13
arXiv — Machine Learning research 25d ago

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

arXiv:2606.05533v1 Announce Type: new Abstract: Existing robot planning systems rely on appearance-based reasoning, where visual observations are encoded into latent spaces organized around object appearances (e.g., recognizing a "cart" based on how it looks). However, planning…

13
Ars Technica — AI news-outlet 25d ago

The skeptic’s guide to humanoid robots going viral on the Internet

Robot demonstrations can distort public perceptions of robotic capabilities.

9
Dwarkesh Podcast news-outlet 25d ago

Alex Imas and Phil Trammell – What remains scarce after AGI?

“One robot now turns into many robots next year, but the number of ballerinas is the same.”

37
TechCrunch — AI news-outlet 25d ago

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

The California startup released the fourth-generation of its home assistance robot, Stretch.

30
Hugging Face Daily Papers research 25d ago

PaintBench: Deterministic Evaluation of Precise Visual Editing

Abstract PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While current…

12
Hugging Face Daily Papers research 25d ago

Cosmos 3: Omnimodal World Models for Physical AI

Abstract Cosmos 3 is an omnimodal world model that processes and generates multiple data types through a unified mixture-of-transformers architecture, achieving state-of-the-art performance in various understanding and generation tasks. Generated by…

38
Hugging Face Daily Papers research 26d ago

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Abstract OVO-S-Bench presents a comprehensive benchmark for evaluating streaming spatial intelligence in multimodal language models through human-annotated questions spanning multiple abstraction levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal agents in robotics,…

23
arXiv — NLP / Computation & Language research 26d ago

Hybrid Adversarial Defence for Natural Language Understanding Tasks

arXiv:2606.04612v1 Announce Type: new Abstract: Large Language Models (LLMs) are vulnerable both to hallucination and adversarial manipulation. Although these problems are closely related, existing defences typically address them separately. We investigate a hybrid defence…

21
arXiv — NLP / Computation & Language research 26d ago

Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation

arXiv:2606.04046v1 Announce Type: cross Abstract: In embodied vision-language decision making tasks such as robotic manipulation and navigation, Vision-Language and Vision-Language-Action Models (VLMs & VLAs) are powerful tools with different benefits: VLMs are better at…

31
Hugging Face Daily Papers research 26d ago

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Abstract GRAIL generates diverse humanoid manipulation and locomotion data through 3D asset composition and video foundation models, enabling effective sim-to-real transfer for robot control. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling humanoid loco-manipulation…

9
Hugging Face Daily Papers research 26d ago

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Abstract AURA-Mem is a recurrent memory system that adapts to embodied AI constraints by writing only when observations affect actions, significantly reducing memory writes compared to traditional KV-cache approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The KV-cache is…

7
Hugging Face Daily Papers research 26d ago

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Abstract Humanoid-GPT is a GPT-style Transformer with causal attention trained on a billion-scale motion corpus that achieves zero-shot generalization to unseen motions and control tasks through scalable pre-training on diverse motion data. Generated by…

29
Hugging Face Daily Papers research 27d ago

AFUN: Towards an Affordance Foundation Model for Functionality Understanding

Abstract Affordance understanding model predicts functional masks and 3D motion curves from RGB-D observations and language descriptions, enabling generalizable robot manipulation across diverse environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Affordance understanding…

36
Hugging Face Daily Papers research 27d ago

τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation

Abstract A unified video-action world model integrates policy learning, video prediction, and action evaluation using a shared video diffusion backbone for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic manipulation requires models that generate…

22
Hugging Face Daily Papers research 27d ago

Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

Abstract Physical AI systems face safety challenges where black-box models can execute harmful actions without detection, necessitating comprehensive runtime guardrail mechanisms for safe operation. AI-generated summary Physical AI systems increasingly map multimodal…

12
Hugging Face Daily Papers research 27d ago

Can Predicted Dynamics Exist in the Physical World?

Abstract Physical admissibility validation for AI systems uses prediction-control interfaces with kinematic and dynamic conditions to filter invalid proposals while maintaining high performance. AI-generated summary Predictive Physical AI systems output state rollouts, action…

33
arXiv — Machine Learning research 28d ago

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

arXiv:2606.00083v1 Announce Type: new Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained…

11
arXiv — NLP / Computation & Language research 28d ago

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

arXiv:2606.01212v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) systems are widely deployed and increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely…

23
Hugging Face Daily Papers research 28d ago

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

Abstract RoboSemanticBench identifies a disconnect between semantic understanding and action prediction in vision-language-action models, where robots can grasp objects but fail to select semantically correct targets. AI-generated summary Vision-language-action (VLA) models are…

15
Hugging Face Daily Papers research 28d ago

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

Abstract RoboStressBench presents a principled benchmark for evaluating vision-language model robustness to physical visual stress in embodied AI, decomposing visual stress into material, viewpoint, lighting, and geometry dimensions. AI-generated summary Vision-Language Models…

4
r/LocalLLaMA community 28d ago

NVIDIA GB300 Grace Blackwell Ultra pricetags

https://www.scan.co.uk/shop/ai-and-robotics/workstations-ai/nvidia-dgx-station   submitted by   /u/X-N2O [link]   [comments]

5
Ars Technica — AI news-outlet 28d ago

Allegedly trashing Airbnbs to test robots puts startup in legal trouble

Lawsuit seeks $12,000 from startup that allegedly damaged home in robot tests.

28
Hugging Face Daily Papers research 28d ago

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Abstract Batch-1 autoregressive decoding in physical AI systems shows that memory bandwidth alone doesn't fully explain latency, with GPU speedup limited by launch overheads and quantization efficiency varying significantly across hardware platforms. AI-generated summary…

16
r/LocalLLaMA community 28d ago

How to build a shitty robot

  submitted by   /u/badlogicgames [link]   [comments]

35
Hugging Face official-blog 29d ago

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

Back to Articles Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action Enterprise + Article Published June 1, 2026 Upvote - Asawaree asawareeb nvidia Atharva Joshi atharvajoshi10 nvidia NVIDIA Cosmos 3 is here - and it's available on Hugging…

23
NVIDIA Developer Blog official-blog 29d ago

Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3

Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what's...

21
Hugging Face Daily Papers research 29d ago

Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal

Abstract Frequency Guidance Operator enables smooth action generation in diffusion policies by steering noisy samples through intermediate sub-frequency manifolds, improving robotic manipulation performance. AI-generated summary Learning visuomotor policies via behavior cloning…

11
arXiv — NLP / Computation & Language research 29d ago

Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely

arXiv:2605.31387v1 Announce Type: new Abstract: Robots operating in diverse environments rely on visual input to interpret objects and spatial layouts. In human-collaborative tasks, they are expected to communicate this understanding through language. Vision-language models…

32
Hugging Face Daily Papers research 29d ago

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

Abstract Hide-and-Seek framework detects robot execution failures in vision-language-action models by localizing failure-indicative actions through contrastive learning from trajectory-level supervision without step-level annotations. AI-generated summary Vision-Language-Action…

18
r/MachineLearning community 1mo ago

Before we spend months processing open-source robotics datasets, tell us why this is a bad idea [D]

Ps. Not pitching anything; Just trying to understand where reality differs from the narrative. We're a couple of ML students, mostly worked on ML/software before, but over the last few months we've been playing with VLAs, robot datasets, and trying to understand where the field…

27
Hugging Face Daily Papers research 1mo ago

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Abstract DynaFLIP is a dynamics-aware multimodal pre-training framework that enhances robot manipulation by integrating motion understanding into visual perception through image-language-3D flow triplets and geometric regularization techniques. AI-generated summary Robot…

22
Ars Technica — AI news-outlet 1mo ago

Startup offers free home cleaning—if it can record it all for robot training

The latest twist in paying humans to wear head cameras for robot training data.

26
Hugging Face Daily Papers research 1mo ago

Reducing Political Manipulation with Consistency Training

Abstract Large language models demonstrate systematic political bias in handling opposing viewpoints, which can be mitigated through a reinforcement learning approach that maintains helpfulness while reducing bias. AI-generated summary Large language models (LLMs) exhibit…

18
Hugging Face Daily Papers research 1mo ago

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Abstract A unified vision-language-action model is presented that integrates diverse embodied decision-making tasks through a shared architecture and training approach, demonstrating strong performance across manipulation, navigation, and trajectory prediction with…

31
r/MachineLearning community 1mo ago

Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

Wall-OSS-0.5 is a new 4B VLA release from X Square Robot, built on a 3B VLM backbone with action experts in a Mixture-of-Transformers layout. What caught my eye is that the report evaluates the pretrained checkpoint on real robots before task-specific fine tuning, instead of…

25
Hugging Face Daily Papers research 1mo ago

Rethinking VLM Representation for VLA Initialization

Abstract Effective vision-language-action model initialization requires balancing pretrained vision-language model representations with embodied task-specific adaptations and robot-data pretraining while preserving core action-relevant features. AI-generated summary…

22
Hugging Face Daily Papers research 1mo ago

Learning High-Frequency Continuous Action Chunks in Latent Space

Abstract High-frequency robotic control is improved by using variational autoencoders to enhance temporal and spatial consistency, combined with a reuse-then-refine strategy for smooth real-time execution. AI-generated summary Modern robotic policies increasingly rely on action…

26
Ars Technica — AI news-outlet 1mo ago

3D-printable humanoid legs let robotics experiments run wild

Hugging Face debuts $2,500 bipedal robot project for builders and researchers.

33
TechCrunch — AI news-outlet 1mo ago

This startup is betting India’s gig economy can train the world’s robots

Human Archive, a startup founded by Berkeley and Stanford researchers, is paying gig workers in India to wear camera-equipped caps and sensor devices to collect the real-world physical training data that AI and robotics labs are racing to acquire.

34
arXiv — NLP / Computation & Language research 1mo ago

GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving

arXiv:2605.25384v1 Announce Type: new Abstract: Mathematical reasoning is a hallmark of human intelligence, requiring logical deduction, symbolic manipulation, and abstract thinking. Recent multimodal large language models (MLLMs) have demonstrated strong performance on geometry…

22
arXiv — Machine Learning research 1mo ago

Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity

arXiv:2605.22871v1 Announce Type: new Abstract: Machine unlearning is a fundamental mechanism that enforces the right to be forgotten. Existing unlearning studies that rely on label manipulation or task-gradient reversal often deliver limited unlearning effectiveness. Moreover,…

38
arXiv — Machine Learning research 1mo ago

Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

arXiv:2605.23411v1 Announce Type: new Abstract: Test-time adaptation (TTA) effectively counters distribution shifts but exposes models to adversarial manipulation via the unlabeled test stream. Existing class-wise targeted attacks remain impractical for stealthy exploitation in…

12
arXiv — NLP / Computation & Language research 1mo ago

Autonomous Frontier-Based Exploration with VLM Guidance

arXiv:2605.23165v1 Announce Type: cross Abstract: Autonomous robotic exploration of unknown and hazardous environments, a long-standing challenge, can be significantly improved by leveraging the advanced reasoning of Vision-Language Models (VLMs). We introduce a novel…

23

I'm looking to join/form a team working on physical AI robotics challenge [P]

World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R]

RobotValues: Evaluating Household Robots When Human Values Conflict

Flash-WAM: Modality-Aware Distillation for World Action Models

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

The skeptic’s guide to humanoid robots going viral on the Internet

Alex Imas and Phil Trammell – What remains scarce after AGI?

Is Silicon Valley ready to put robots in people&#8217;s homes? Hello Robot is.

PaintBench: Deterministic Evaluation of Precise Visual Editing

Cosmos 3: Omnimodal World Models for Physical AI

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Hybrid Adversarial Defence for Natural Language Understanding Tasks

Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

AFUN: Towards an Affordance Foundation Model for Functionality Understanding

τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation

Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

Can Predicted Dynamics Exist in the Physical World?

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

NVIDIA GB300 Grace Blackwell Ultra pricetags

Allegedly trashing Airbnbs to test robots puts startup in legal trouble

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

How to build a shitty robot

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3

Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal

Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

Before we spend months processing open-source robotics datasets, tell us why this is a bad idea [D]

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Startup offers free home cleaning—if it can record it all for robot training

Reducing Political Manipulation with Consistency Training

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

Rethinking VLM Representation for VLA Initialization

Learning High-Frequency Continuous Action Chunks in Latent Space

3D-printable humanoid legs let robotics experiments run wild

This startup is betting India&#8217;s gig economy can train the world&#8217;s robots

GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving

Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity

Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

Autonomous Frontier-Based Exploration with VLM Guidance

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

This startup is betting India’s gig economy can train the world’s robots