News / #robotics Tag Robotics 184 articles archived under #robotics · RSS Sign in to follow r/MachineLearning community 13d ago I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D] Spent the last few weeks on a benchmark/harness that tries to answer one question honestly: did a robot arm actually do the demonstrated task, or did the success metric just get fooled? The setup: compile a human demo into an object-centric graph (what changed in the world:… 7 Hugging Face Daily Papers research 13d ago Human Universal Grasping Abstract A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans can grasp objects effortlessly, whereas multi-fingered robots… 25 Hugging Face Daily Papers research 13d ago LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies Abstract LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving high performance with reduced computational latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language-Action models (VLAs)… 33 r/LocalLLaMA community 13d ago Qwen Robot Suite Looks pretty cool... https://qwen.ai/blog?id=qwen-robotsuite   submitted by   /u/Snoo_27681 [link]   [comments] 8 Hugging Face Daily Papers research 13d ago Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation Abstract Qwen-RobotWorld is a language-conditioned video world model that predicts future visual trajectories across multiple robotic domains using a double-stream diffusion transformer and embodied world knowledge corpus. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We… 5 Hugging Face Daily Papers research 13d ago Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving… 9 Hugging Face Daily Papers research 14d ago Geometric Action Model for Robot Policy Learning Abstract A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generalist robot… 21 arXiv — Machine Learning research 14d ago Phase-Localized Curation Does Not Help: A Negative Result on Per-Phase Metric Selection for Demonstration Filtering arXiv:2606.15064v1 Announce Type: new Abstract: Manipulation demonstrations have temporal phase structure, and a natural hypothesis is that demonstration-curation metrics should be applied within phases rather than globally. The idea is to segment each trajectory into phases,… 10 arXiv — NLP / Computation & Language research 14d ago Beyond English: Uncovering the Multilingual Gap in Vision-Language-Action Models arXiv:2606.15714v1 Announce Type: new Abstract: Vision-Language-Action models have recently demonstrated promising capabilities in learning generalist robot policies from large-scale multimodal data. However, most existing VLA systems are trained and evaluated primarily with… 12 NVIDIA Developer Blog official-blog 14d ago Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models Quick glossary for readers new to VLA/WAM terminology VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it... 22 arXiv — Machine Learning research 15d ago SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting arXiv:2606.13901v1 Announce Type: new Abstract: Spiking Neural Networks (SNNs) have emerged as an energy-efficient alternative to conventional neural networks, demonstrating strong performance in computer vision and robotics. More recently, SNNs have been applied to time series… 30 arXiv — Machine Learning research 15d ago More with LESS -- Local Scene Representations for Tactile Imaging arXiv:2606.14344v1 Announce Type: new Abstract: Tactile imaging seeks to reconstruct the internal structure of soft objects through touch sensing, with applications in medical diagnosis and robotic manipulation. Recent self-supervised learning approaches have shown promising… 33 arXiv — NLP / Computation & Language research 15d ago Persuasion Index: A Theory-Guided Framework for Persuasion Analysis arXiv:2606.14580v1 Announce Type: new Abstract: Identifying persuasive rhetorical cues is critical across domains, from detecting information manipulation and improving AI safety to advancing public health communication. We propose Persuasion Index (PI), a taxonomy of 15… 36 Ars Technica — AI news-outlet 17d ago Here's what Jeff Bezos' new startup Prometheus will do It isn't the only startup tackling physical AI, but it's one of the best-funded. 5 Ars Technica — AI news-outlet 17d ago Ukraine's one-time test used fully autonomous drones to kill Russian soldiers Full autonomy is rare, but Ukraine is installing AI modules on drones and robots. 32 Hugging Face Daily Papers research 17d ago WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation Abstract WEAVER is a multi-view world model architecture that achieves high fidelity, consistency, and efficiency in robotic manipulation tasks through flow-matching loss and demonstrates superior performance in policy evaluation, improvement, and test-time planning. Generated… 27 Hugging Face Daily Papers research 17d ago Revisiting Articulated Parts Perception in Robot Manipulation Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by… 27 Hugging Face Daily Papers research 18d ago LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Abstract LabVLA, a vision-language-action model trained with a two-stage approach combining action token pretraining and flow matching, demonstrates superior performance on laboratory automation tasks through simulated data generation and robot-specific learning. Generated by… 18 arXiv — NLP / Computation & Language research 18d ago Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review arXiv:2606.12716v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of… 8 arXiv — NLP / Computation & Language research 18d ago ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm arXiv:2606.13239v1 Announce Type: cross Abstract: Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches struggle with… 34 Hugging Face Daily Papers research 18d ago MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning Abstract A Gymnasium-compatible multi-drone simulation environment built on MuJoCo physics engine that supports flexible physics models, action interfaces, and observation spaces for reinforcement learning applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic… 35 TechCrunch — AI news-outlet 18d ago Theker just raised $85M to build the factory robot that doesn’t specialize in anything Unlike humanoid robots designed around a fixed form — think Boston Dynamics — Theker's machines are built to be reconfigured. 18 TechCrunch — AI news-outlet 18d ago Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world The new round values the physical AI startup that aims to automate heavy engineering and drug design at $41 billion. 31 r/LocalLLaMA community 18d ago Refiner: Robotics library from the ex-Hugging Face pre-training team ex-Huggingface pre-training team just announce a new library create for robotics data refinment! It supports ingestion of all robotics formats (Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot), as well as the common processing flows like visual hand-tracking, subtask annotations… 26 arXiv — Machine Learning research 19d ago Implicit Neural Representations of Individual Behavior arXiv:2606.12200v1 Announce Type: new Abstract: We study policy representation learning from unlabeled multi-policy behavioral data. Each episode is generated by a fixed policy, but policy labels are unavailable. This setting appears in robotics play, demonstrations, games,… 28 arXiv — Machine Learning research 19d ago Fourier Features Let Agents Learn High Precision Policies with Imitation Learning arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information… 14 arXiv — NLP / Computation & Language research 19d ago Detecting AI-Generated Content on Social Media with Multi-modal Language Models arXiv:2606.11200v1 Announce Type: new Abstract: Generative AI has enabled the creation of photorealistic images and videos that are increasingly disseminated on social media, often used for spam, misinformation, manipulation, and fraud. Existing AI-generated content (AIGC)… 36 arXiv — NLP / Computation & Language research 19d ago When Does Language Matter? Multilingual Instructions Reveal Step-wise Language Sensitivity in Vision-Language-Action Models arXiv:2606.11906v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have shown strong performance in language-conditioned robotic manipulation, yet their robustness to linguistic variation remains poorly understood. In this work, we present the first systematic… 17 Hugging Face Daily Papers research 19d ago World Pilot: Steering Vision-Language-Action Models with World-Action Priors Abstract World Pilot enhances Vision-Language-Action models by incorporating dynamic scene evolution and trajectory priors from a World-Action Model, achieving superior performance in zero-shot out-of-distribution manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 10 Hugging Face Daily Papers research 19d ago BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Abstract BrainSurgery is a tool for robust and reproducible tensor manipulation of neural network checkpoints through declarative YAML plans with built-in validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As deep learning models scale, managing, inspecting, and modifying… 12 arXiv — Machine Learning research 20d ago Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming arXiv:2606.09919v1 Announce Type: new Abstract: Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from… 36 arXiv — NLP / Computation & Language research 20d ago TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning arXiv:2606.10316v1 Announce Type: new Abstract: Spreadsheets and tables are widely used representations for structured data analysis, but effective analysis still requires substantial manual effort and domain expertise. Recent large language model (LLM) agents can automate parts… 31 arXiv — NLP / Computation & Language research 20d ago Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use arXiv:2606.10803v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) excel at utilizing digital APIs and increasingly serve as the "brain" of embodied AI, instructing robots to interact with the physical world. In such embodied settings, a central capability… 38 Hugging Face Daily Papers research 20d ago ABot-Earth 0.5: Generative 3D Earth Model Abstract ABot-Earth 0.5 generates realistic 3D environments from satellite imagery using 3D Gaussian Splatting representation, enabling fast synthesis and real-time visualization for Embodied AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present ABot-Earth… 22 Hugging Face Daily Papers research 20d ago VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation Abstract VoLoAgent enables physical orchestration by integrating vision-language models with robot capabilities for open-vocabulary long-horizon manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Open-vocabulary long-horizon manipulation requires robots to reason… 24 The Information — AI news-outlet 20d ago Kalshi Asks Some Customers For Employer Information Prediction markets platform Kalshi is asking customers in some wagers to provide the name of their employer, industry and job function before making bets, to help the company crack down on potential insider trading. “For markets with heightened insider or manipulation risk, we… 23 TechCrunch — AI news-outlet 20d ago Hey Siri, here’s what I actually want from AI I'm desperate for a personal AI assistant, but do I really want to become the kind of person who can't function without the friendly robot voice in my phone? 4 Hugging Face Daily Papers research 20d ago Robotic Policy Adaptation via Weight-Space Meta-Learning Abstract WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning. Generated by… 31 Hugging Face Daily Papers research 20d ago Light-WAM: Efficient World Action Models with State-Fusion Action Decoding Abstract Light-WAM is a lightweight world action model for robot manipulation that uses a compact video backbone and downsampled latent space for efficient future-video supervision, combined with a StateFusionActionExpert for direct action prediction. Generated by… 25 Google DeepMind official-blog 20d ago Powering the future of robotics in Europe Powering the future of robotics in Europe Jun 09, 2026 · Share x.com Facebook LinkedIn Mail Google DeepMind Accelerator selects 15 robotics companies from across Europe to join the program. Providing 3 months of intensive mentorship and technical support, enabling the… 22 Hugging Face Daily Papers research 20d ago WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models Abstract WorldCraft extends interactive video world models to enable object-level trajectory control while maintaining camera navigation capabilities through specialized control pipelines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent video-based world models have made… 27 r/LocalLLaMA community 20d ago Jetson Orin NX Build for Hermes Agent + Benchmarking I had a huge LLM server , and now I have a tiny one! I had a Jetson Orin NX gathering dust from a long dead robotics project, from back in the Llama-7B days. I figured now with MoE and smaller models doing well, it was time to mess with it again. Goal: As silent as possible… 34 The Information — AI news-outlet 20d ago U.S. Accuses Alibaba, Baidu, Others of Aiding Chinese Military in Blacklist Move The U.S. Department of Defense on Monday added more than a dozen Chinese tech companies including Alibaba and Baidu to a blacklist, a move that could further escalate tensions between the world’s two largest economies. Electric vehicle makers Byd and Nio, humanoid maker Unitree,… 25 Hugging Face Daily Papers research 21d ago AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing Abstract AHA-WAM is an asynchronous world-action model that uses dual Diffusion Transformers to enable efficient long-horizon planning and real-time action execution in robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World-action models have emerged as a… 4 Hugging Face Daily Papers research 21d ago OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation Abstract A simulation-data-driven framework for humanoid loco-manipulation that uses 3D generative models to create realistic assets and hierarchical visuomotor policies trained on simulated data achieves better zero-shot performance than real-robot training. Generated by… 24 Hugging Face Daily Papers research 21d ago Robots Need More than VLA and World Models Abstract Robot intelligence advancement requires integrating unstructured behavioral data through specialized interfaces for labeling, embodiment mapping, world modeling, and reward inference rather than relying solely on policy scaling. Generated by… 27 arXiv — NLP / Computation & Language research 22d ago The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective arXiv:2606.07017v1 Announce Type: cross Abstract: Foundation model agents are increasingly deployed for real-world decision-making, but suffer from the sim-to-real gap. While robotics and classical control have mature frameworks to address this gap, the foundation model… 5 Hugging Face Daily Papers research 22d ago LIMMT: Less is More for Motion Tracking Abstract Training with high-quality motion data improves tracking policy optimization trajectories, with minimal data subsets outperforming full datasets in physics-based humanoid motion tracking. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We argue that high-quality motion… 24 Hugging Face Daily Papers research 24d ago The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset Abstract KITScenes Multimodal dataset provides high-fidelity European driving data with comprehensive 3D maps and diverse urban environments for embodied AI research. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing autonomous driving datasets have enabled major progress,… 36 Hugging Face Daily Papers research 24d ago AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding Abstract AffordanceVLA introduces a unified framework that uses structured affordance forecasting as an intermediate representation to improve the precision of perception-action mapping in robotic manipulation by leveraging vision-language models. Generated by… 4 Page 2 of 4 · 184 articles ← Newer Older →