Tag

Robotics

184 articles archived under #robotics · RSS

r/MachineLearning community 13d ago

I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D]

Spent the last few weeks on a benchmark/harness that tries to answer one question honestly: did a robot arm actually do the demonstrated task, or did the success metric just get fooled? The setup: compile a human demo into an object-centric graph (what changed in the world:…

7
Hugging Face Daily Papers research 13d ago

Human Universal Grasping

Abstract A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans can grasp objects effortlessly, whereas multi-fingered robots…

25
Hugging Face Daily Papers research 13d ago

LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

Abstract LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving high performance with reduced computational latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language-Action models (VLAs)…

33
r/LocalLLaMA community 13d ago

Qwen Robot Suite

Looks pretty cool... https://qwen.ai/blog?id=qwen-robotsuite   submitted by   /u/Snoo_27681 [link]   [comments]

8
Hugging Face Daily Papers research 13d ago

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Abstract Qwen-RobotWorld is a language-conditioned video world model that predicts future visual trajectories across multiple robotic domains using a double-stream diffusion transformer and embodied world knowledge corpus. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…

5
Hugging Face Daily Papers research 13d ago

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving…

9
Hugging Face Daily Papers research 14d ago

Geometric Action Model for Robot Policy Learning

Abstract A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generalist robot…

21
arXiv — Machine Learning research 14d ago

Phase-Localized Curation Does Not Help: A Negative Result on Per-Phase Metric Selection for Demonstration Filtering

arXiv:2606.15064v1 Announce Type: new Abstract: Manipulation demonstrations have temporal phase structure, and a natural hypothesis is that demonstration-curation metrics should be applied within phases rather than globally. The idea is to segment each trajectory into phases,…

10
arXiv — NLP / Computation & Language research 14d ago

Beyond English: Uncovering the Multilingual Gap in Vision-Language-Action Models

arXiv:2606.15714v1 Announce Type: new Abstract: Vision-Language-Action models have recently demonstrated promising capabilities in learning generalist robot policies from large-scale multimodal data. However, most existing VLA systems are trained and evaluated primarily with…

12
NVIDIA Developer Blog official-blog 14d ago

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models

Quick glossary for readers new to VLA/WAM terminology VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it...

22
arXiv — Machine Learning research 15d ago

SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting

arXiv:2606.13901v1 Announce Type: new Abstract: Spiking Neural Networks (SNNs) have emerged as an energy-efficient alternative to conventional neural networks, demonstrating strong performance in computer vision and robotics. More recently, SNNs have been applied to time series…

30
arXiv — Machine Learning research 15d ago

More with LESS -- Local Scene Representations for Tactile Imaging

arXiv:2606.14344v1 Announce Type: new Abstract: Tactile imaging seeks to reconstruct the internal structure of soft objects through touch sensing, with applications in medical diagnosis and robotic manipulation. Recent self-supervised learning approaches have shown promising…

33
arXiv — NLP / Computation & Language research 15d ago

Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

arXiv:2606.14580v1 Announce Type: new Abstract: Identifying persuasive rhetorical cues is critical across domains, from detecting information manipulation and improving AI safety to advancing public health communication. We propose Persuasion Index (PI), a taxonomy of 15…

36
Ars Technica — AI news-outlet 17d ago

Here's what Jeff Bezos' new startup Prometheus will do

It isn't the only startup tackling physical AI, but it's one of the best-funded.

5
Ars Technica — AI news-outlet 17d ago

Ukraine's one-time test used fully autonomous drones to kill Russian soldiers

Full autonomy is rare, but Ukraine is installing AI modules on drones and robots.

32
Hugging Face Daily Papers research 17d ago

WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

Abstract WEAVER is a multi-view world model architecture that achieves high fidelity, consistency, and efficiency in robotic manipulation tasks through flow-matching loss and demonstrates superior performance in policy evaluation, improvement, and test-time planning. Generated…

27
Hugging Face Daily Papers research 17d ago

Revisiting Articulated Parts Perception in Robot Manipulation

Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by…

27
Hugging Face Daily Papers research 18d ago

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Abstract LabVLA, a vision-language-action model trained with a two-stage approach combining action token pretraining and flow matching, demonstrates superior performance on laboratory automation tasks through simulated data generation and robot-specific learning. Generated by…

18
arXiv — NLP / Computation & Language research 18d ago

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

arXiv:2606.12716v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of…

8
arXiv — NLP / Computation & Language research 18d ago

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

arXiv:2606.13239v1 Announce Type: cross Abstract: Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches struggle with…

34
Hugging Face Daily Papers research 18d ago

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

Abstract A Gymnasium-compatible multi-drone simulation environment built on MuJoCo physics engine that supports flexible physics models, action interfaces, and observation spaces for reinforcement learning applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic…

35
TechCrunch — AI news-outlet 18d ago

Theker just raised $85M to build the factory robot that doesn’t specialize in anything

Unlike humanoid robots designed around a fixed form — think Boston Dynamics — Theker's machines are built to be reconfigured.

18
TechCrunch — AI news-outlet 18d ago

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world

The new round values the physical AI startup that aims to automate heavy engineering and drug design at $41 billion.

31
r/LocalLLaMA community 18d ago

Refiner: Robotics library from the ex-Hugging Face pre-training team

ex-Huggingface pre-training team just announce a new library create for robotics data refinment! It supports ingestion of all robotics formats (Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot), as well as the common processing flows like visual hand-tracking, subtask annotations…

26
arXiv — Machine Learning research 19d ago

Implicit Neural Representations of Individual Behavior

arXiv:2606.12200v1 Announce Type: new Abstract: We study policy representation learning from unlabeled multi-policy behavioral data. Each episode is generated by a fixed policy, but policy labels are unavailable. This setting appears in robotics play, demonstrations, games,…

28
arXiv — Machine Learning research 19d ago

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information…

14
arXiv — NLP / Computation & Language research 19d ago

Detecting AI-Generated Content on Social Media with Multi-modal Language Models

arXiv:2606.11200v1 Announce Type: new Abstract: Generative AI has enabled the creation of photorealistic images and videos that are increasingly disseminated on social media, often used for spam, misinformation, manipulation, and fraud. Existing AI-generated content (AIGC)…

36
arXiv — NLP / Computation & Language research 19d ago

When Does Language Matter? Multilingual Instructions Reveal Step-wise Language Sensitivity in Vision-Language-Action Models

arXiv:2606.11906v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have shown strong performance in language-conditioned robotic manipulation, yet their robustness to linguistic variation remains poorly understood. In this work, we present the first systematic…

17
Hugging Face Daily Papers research 19d ago

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

Abstract World Pilot enhances Vision-Language-Action models by incorporating dynamic scene evolution and trajectory priors from a World-Action Model, achieving superior performance in zero-shot out-of-distribution manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

10
Hugging Face Daily Papers research 19d ago

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Abstract BrainSurgery is a tool for robust and reproducible tensor manipulation of neural network checkpoints through declarative YAML plans with built-in validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As deep learning models scale, managing, inspecting, and modifying…

12
arXiv — Machine Learning research 20d ago

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

arXiv:2606.09919v1 Announce Type: new Abstract: Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from…

36
arXiv — NLP / Computation & Language research 20d ago

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

arXiv:2606.10316v1 Announce Type: new Abstract: Spreadsheets and tables are widely used representations for structured data analysis, but effective analysis still requires substantial manual effort and domain expertise. Recent large language model (LLM) agents can automate parts…

31
arXiv — NLP / Computation & Language research 20d ago

Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

arXiv:2606.10803v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) excel at utilizing digital APIs and increasingly serve as the "brain" of embodied AI, instructing robots to interact with the physical world. In such embodied settings, a central capability…

38
Hugging Face Daily Papers research 20d ago

ABot-Earth 0.5: Generative 3D Earth Model

Abstract ABot-Earth 0.5 generates realistic 3D environments from satellite imagery using 3D Gaussian Splatting representation, enabling fast synthesis and real-time visualization for Embodied AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present ABot-Earth…

22
Hugging Face Daily Papers research 20d ago

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

Abstract VoLoAgent enables physical orchestration by integrating vision-language models with robot capabilities for open-vocabulary long-horizon manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Open-vocabulary long-horizon manipulation requires robots to reason…

24
The Information — AI news-outlet 20d ago

Kalshi Asks Some Customers For Employer Information

Prediction markets platform Kalshi is asking customers in some wagers to provide the name of their employer, industry and job function before making bets, to help the company crack down on potential insider trading. “For markets with heightened insider or manipulation risk, we…

23
TechCrunch — AI news-outlet 20d ago

Hey Siri, here’s what I actually want from AI

I'm desperate for a personal AI assistant, but do I really want to become the kind of person who can't function without the friendly robot voice in my phone?

4
Hugging Face Daily Papers research 20d ago

Robotic Policy Adaptation via Weight-Space Meta-Learning

Abstract WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning. Generated by…

31
Hugging Face Daily Papers research 20d ago

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

Abstract Light-WAM is a lightweight world action model for robot manipulation that uses a compact video backbone and downsampled latent space for efficient future-video supervision, combined with a StateFusionActionExpert for direct action prediction. Generated by…

25
Google DeepMind official-blog 20d ago

Powering the future of robotics in Europe

Powering the future of robotics in Europe Jun 09, 2026 · Share x.com Facebook LinkedIn Mail Google DeepMind Accelerator selects 15 robotics companies from across Europe to join the program. Providing 3 months of intensive mentorship and technical support, enabling the…

22
Hugging Face Daily Papers research 20d ago

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

Abstract WorldCraft extends interactive video world models to enable object-level trajectory control while maintaining camera navigation capabilities through specialized control pipelines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent video-based world models have made…

27
r/LocalLLaMA community 20d ago

Jetson Orin NX Build for Hermes Agent + Benchmarking

I had a huge LLM server , and now I have a tiny one! I had a Jetson Orin NX gathering dust from a long dead robotics project, from back in the Llama-7B days. I figured now with MoE and smaller models doing well, it was time to mess with it again. Goal: As silent as possible…

34
The Information — AI news-outlet 20d ago

U.S. Accuses Alibaba, Baidu, Others of Aiding Chinese Military in Blacklist Move

The U.S. Department of Defense on Monday added more than a dozen Chinese tech companies including Alibaba and Baidu to a blacklist, a move that could further escalate tensions between the world’s two largest economies. Electric vehicle makers Byd and Nio, humanoid maker Unitree,…

25
Hugging Face Daily Papers research 21d ago

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

Abstract AHA-WAM is an asynchronous world-action model that uses dual Diffusion Transformers to enable efficient long-horizon planning and real-time action execution in robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World-action models have emerged as a…

4
Hugging Face Daily Papers research 21d ago

OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

Abstract A simulation-data-driven framework for humanoid loco-manipulation that uses 3D generative models to create realistic assets and hierarchical visuomotor policies trained on simulated data achieves better zero-shot performance than real-robot training. Generated by…

24
Hugging Face Daily Papers research 21d ago

Robots Need More than VLA and World Models

Abstract Robot intelligence advancement requires integrating unstructured behavioral data through specialized interfaces for labeling, embodiment mapping, world modeling, and reward inference rather than relying solely on policy scaling. Generated by…

27
arXiv — NLP / Computation & Language research 22d ago

The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

arXiv:2606.07017v1 Announce Type: cross Abstract: Foundation model agents are increasingly deployed for real-world decision-making, but suffer from the sim-to-real gap. While robotics and classical control have mature frameworks to address this gap, the foundation model…

5
Hugging Face Daily Papers research 22d ago

LIMMT: Less is More for Motion Tracking

Abstract Training with high-quality motion data improves tracking policy optimization trajectories, with minimal data subsets outperforming full datasets in physics-based humanoid motion tracking. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We argue that high-quality motion…

24
Hugging Face Daily Papers research 24d ago

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Abstract KITScenes Multimodal dataset provides high-fidelity European driving data with comprehensive 3D maps and diverse urban environments for embodied AI research. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing autonomous driving datasets have enabled major progress,…

36
Hugging Face Daily Papers research 24d ago

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

Abstract AffordanceVLA introduces a unified framework that uses structured affordance forecasting as an intermediate representation to improve the precision of perception-action mapping in robotic manipulation by leveraging vision-language models. Generated by…

4

I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D]

Human Universal Grasping

LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

Qwen Robot Suite

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Geometric Action Model for Robot Policy Learning

Phase-Localized Curation Does Not Help: A Negative Result on Per-Phase Metric Selection for Demonstration Filtering

Beyond English: Uncovering the Multilingual Gap in Vision-Language-Action Models

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models

SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting

More with LESS -- Local Scene Representations for Tactile Imaging

Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

Here&#039;s what Jeff Bezos&#039; new startup Prometheus will do

Ukraine&#039;s one-time test used fully autonomous drones to kill Russian soldiers

WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

Revisiting Articulated Parts Perception in Robot Manipulation

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

Theker just raised $85M to build the factory robot that doesn&#8217;t specialize in anything

Jeff Bezos&#8217;s Prometheus raises $12B to build an &#8216;artificial general engineer&#8217; for the physical world

Refiner: Robotics library from the ex-Hugging Face pre-training team

Implicit Neural Representations of Individual Behavior

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

Detecting AI-Generated Content on Social Media with Multi-modal Language Models

When Does Language Matter? Multilingual Instructions Reveal Step-wise Language Sensitivity in Vision-Language-Action Models

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

ABot-Earth 0.5: Generative 3D Earth Model

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

Kalshi Asks Some Customers For Employer Information

Hey Siri, here&#8217;s what I actually want from AI

Robotic Policy Adaptation via Weight-Space Meta-Learning

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

Powering the future of robotics in Europe

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

Jetson Orin NX Build for Hermes Agent + Benchmarking

U.S. Accuses Alibaba, Baidu, Others of Aiding Chinese Military in Blacklist Move

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

Robots Need More than VLA and World Models

The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

LIMMT: Less is More for Motion Tracking

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

Here's what Jeff Bezos' new startup Prometheus will do

Ukraine's one-time test used fully autonomous drones to kill Russian soldiers

Theker just raised $85M to build the factory robot that doesn’t specialize in anything

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world

Hey Siri, here’s what I actually want from AI