News / #agents Tag Agents + tool use 500 articles archived under #agents · RSS Sign in to follow Hugging Face Daily Papers research 4d ago PrivacyAlign: Contextual Privacy Alignment for LLM Agents Abstract Researchers develop a human-centered approach to align AI agents with privacy norms by creating a comprehensive dataset of privacy judgments and using annotation-conditioned reward modeling to improve agent behavior. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI… 7 r/MachineLearning community 4d ago Tool selection at scale is a retrieval problem, and document-style defaults are the wrong starting point [D] A pattern I keep running into building agents. Posting as a discussion because I think the standard intuition is backwards for this specific case. Setup is an agent with a big set of callable tools (mine are MCP-exposed, but the shape generalises to any function-calling loop).… 21 Vercel — AI dev-tools 4d ago AI SDK 7 AI SDK, with over 16 million weekly downloads, is the TypeScript SDK for building AI applications, features, frameworks, and agents across any model provider. It's the same layer eve , Vercel's open-source agent framework, is built on. AI SDK 7 adds production depth for agent… 15 Hugging Face Daily Papers research 4d ago Autodata: An agentic data scientist to create high quality synthetic data Abstract Autodata enables AI agents to function as data scientists who create high-quality training data through meta-optimization, demonstrating improved performance across multiple task domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce Autodata, a general… 30 Vercel — AI dev-tools 4d ago AI SDK 7 is now available AI SDK 7 is a major release for building production agents in TypeScript. The SDK has grown from model calls and chat primitives into a broader agent platform for developing, running, integrating, and observing agents across text, audio, realtime, image, and video. Every major… 8 Vercel — AI dev-tools 4d ago Teaching agents product design at Vercel Coding agents can produce working UI fast, but what's harder is a different shape. They can copy your product's style, match its patterns, and try to follow its conventions. What they cannot do is understand why those patterns exist. Code shows agents what shipped, not why one… 17 Smol AI News news-outlet 5d ago not much happened today **Z.ai's GLM-5.2** leads in coding and agent benchmarks with top scores like **1595** on Code Arena: Frontend and **34.29%** reasoning accuracy with zero failures. Databricks improved GLM-5.2 speed to **392 tok/s** using hardware and optimizations. **Ornith-1.0**, a new… 13 Hugging Face Daily Papers research 5d ago The Hitchhiker's Guide to Agentic AI: From Foundations to Systems Abstract The book provides a comprehensive guide to building autonomous AI systems, covering foundational elements like transformer architecture and training methods, along with advanced topics such as reinforcement learning, agent architectures, and production deployment.… 5 Hugging Face Daily Papers research 5d ago RL-Index: Reinforcement Learning for Retrieval Index Reasoning Abstract RL-Index introduces an agentic indexing framework that shifts reasoning from query time to indexing stage by using LLM-generated rationales and reinforcement learning to improve retrieval effectiveness and reduce latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 25 Hugging Face Daily Papers research 5d ago When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents Abstract LLM agents frequently select higher-privilege tools unnecessarily, and while safety alignment doesn't ensure least-privilege choices, a post-training defense can reduce excessive privilege use without sacrificing performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 26 arXiv — Machine Learning research 5d ago GCT-MARL: Graph-Based Contrastive Transfer for Sample-Efficient Cooperative Multi-Agent Reinforcement Learning arXiv:2606.25073v1 Announce Type: new Abstract: In cooperative multi-agent reinforcement learning (MARL), from a deployment perspective, it is challenging and expensive to train agents from scratch for each new environment or task. In this work, we propose GCT-MARL, a transfer… 30 arXiv — Machine Learning research 5d ago Forget to Improve: On-Device LLM-Agent Continual Learning via Budget-Curated Memory arXiv:2606.25115v1 Announce Type: new Abstract: On-device language-model agents improve by accumulating experience in retrieved memory rather than by updating weights. This memory is hard-bounded and exposed: it consumes RAM and energy, reaches peers through a thin uplink, and… 24 arXiv — Machine Learning research 5d ago Reward-Conditioned Attention: How Reward Design Shapes What Autonomous Driving Agents See arXiv:2606.25127v1 Announce Type: new Abstract: We investigate how reward design shapes the internal attention patterns of reinforcement learning agents trained for autonomous driving. Using three Perceiver-based agents that share identical architectures and training data but… 33 arXiv — NLP / Computation & Language research 5d ago The Interplay of Harness Design and Post-Training in LLM Agents arXiv:2606.25447v1 Announce Type: cross Abstract: Tool-integrated LLM agents are often wrapped within a harness: the scaffolding that determines which tools are exposed, how they are described, and what auxiliary information accompanies each per-step observation. While agents… 15 arXiv — Machine Learning research 5d ago Low Variance Trust Region Optimization with Independent Actors and Sequential Updates in Cooperative Multi-agent Reinforcement Learning arXiv:2606.25526v1 Announce Type: new Abstract: Cooperative multi-agent reinforcement learning assumes each agent shares the same reward function and can be trained effectively using the Trust Region framework of single-agent. Instead of relying on other agents' actions, the… 28 arXiv — Machine Learning research 5d ago Beyond One-Size-Fits-All: Diagnosis-Driven Online Reinforcement Learning with Offline Priors arXiv:2606.25527v1 Announce Type: new Abstract: Online reinforcement learning (RL) agents increasingly depend on knowledge acquired offline to achieve practical efficiency. Originally studied in offline-to-online RL, this paradigm now spans foundation model post-training and… 27 arXiv — NLP / Computation & Language research 5d ago AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents arXiv:2606.24893v1 Announce Type: new Abstract: For agents to learn continuously from interaction with the world at test time, they must be able to explore effectively, acquire new world knowledge and skills, retain relevant episodic experiences, and plan over long horizons. To… 17 arXiv — NLP / Computation & Language research 5d ago Memory Makes the Difference: Evaluating How Different Memory Roles Shape Conversational Agents arXiv:2606.25361v1 Announce Type: new Abstract: Prior research on memory mechanism in RAG-based conversational system has emphasized how memory is stored and retrieved. However, far less is known about how memories with different functional roles influence response quality.… 27 arXiv — NLP / Computation & Language research 5d ago Beyond Next-Observation Prediction: Agent-Authored World Modeling for Sequential Decision Making arXiv:2606.25421v1 Announce Type: new Abstract: Recent studies on world modeling for Large Language Model (LLM) agents typically formulate the learning objective as next-observation prediction. However, this objective ties supervision to what a transition happens to reveal,… 32 arXiv — NLP / Computation & Language research 5d ago BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents arXiv:2606.25556v1 Announce Type: new Abstract: Stepwise group-based RL is an attractive way to train long-horizon LLM agents without a learned critic: it reuses multiple sampled rollouts to estimate local advantages. Its weakness is less visible but more fundamental: every… 11 arXiv — NLP / Computation & Language research 5d ago Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints arXiv:2606.25605v1 Announce Type: new Abstract: Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed… 10 arXiv — NLP / Computation & Language research 5d ago Staying In Character: Perspective-Bounded Memory For Book-Based Role-Playing Agents arXiv:2606.25632v1 Announce Type: new Abstract: Recent LLM role-playing systems build character agents from novels by extracting characters, scenes, and relations. Yet long-narrative role-playing suffers from two failures: Factual Overreach, where shared retrieval or parametric… 30 arXiv — NLP / Computation & Language research 5d ago Is GraphRAG Needed? From Basic RAG to Graph-/Agentic Solutions with Context Optimization arXiv:2606.25656v1 Announce Type: new Abstract: As advanced RAG variants like GraphRAG and Agentic RAG emerge, one leading question is when and how to use them. Here, we introduce a framework for different RAG scenarios evaluation and comparison on semi-structured knowledge… 21 arXiv — NLP / Computation & Language research 5d ago Beyond Function Calling: Benchmarking Tool-Using Agents under Tool-Environment Unreliability arXiv:2606.25819v1 Announce Type: new Abstract: Large language models are increasingly deployed as agents that solve tasks by interacting with external tool environments. Although recent tool-use benchmarks increasingly cover complex task settings, they still largely assume… 26 arXiv — NLP / Computation & Language research 5d ago Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It arXiv:2606.26027v1 Announce Type: new Abstract: Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited… 17 arXiv — NLP / Computation & Language research 5d ago The Hitchhiker's Guide to Agentic AI: From Foundations to Systems arXiv:2606.24937v1 Announce Type: cross Abstract: The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central… 25 arXiv — NLP / Computation & Language research 5d ago Diagnosing and Mitigating Compounding Failures in Agentic Persuasion via Taxonomic Strategy Retrieval arXiv:2606.24976v1 Announce Type: cross Abstract: Foundation-model agents in multi-step, open-ended environments frequently suffer from compounding errors, where early mistakes contaminate long-horizon trajectories. While Multi-Agent Debate (MAD) succeeds in deterministic… 10 arXiv — NLP / Computation & Language research 5d ago Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets arXiv:2606.25760v1 Announce Type: cross Abstract: Computer-use agents turn vision-language model (VLM) predictions into executable GUI clicks, so reliable uncertainty estimates are essential for rejection, calibration, miss-severity ranking, and spatial safety regions. Yet… 14 arXiv — NLP / Computation & Language research 5d ago Autodata: An agentic data scientist to create high quality synthetic data arXiv:2606.25996v1 Announce Type: cross Abstract: We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to… 30 arXiv — NLP / Computation & Language research 5d ago Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents arXiv:2601.03785v3 Announce Type: replace Abstract: Long-term human-agent dialogues are organized by topic continuity: adjacent turns often develop the same goal, plan, problem, or event, while related activities may recur across distant sessions. Yet many LLM agent memory… 25 MIT News — AI research 5d ago Improving the speed and energy-efficiency of AI agents A new system, known as Murakkab, optimizes the design and deployment of multistep workflows that power AI applications. 26 Hugging Face Daily Papers research 5d ago Are We Ready For An Agent-Native Memory System? Abstract Large language model agents' memory systems have evolved into complex data management frameworks requiring systematic evaluation across multiple modules and workloads to understand their performance characteristics and trade-offs. Generated by… 7 OpenAI official-blog 5d ago How agents are transforming work A new OpenAI research paper shows how AI agents are transforming work, enabling longer, more complex tasks and expanding productivity across roles. 28 Hugging Face Daily Papers research 5d ago MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery Abstract Long-term memory in LLM agents should be evaluated as an auditable post-interaction artifact by reconstructing structured user state from the agent's memory, as demonstrated by MEMPROBE, a benchmark testing memory recovery against synthetic ground truth across 50… 21 Vercel — AI dev-tools 5d ago Deep Agents and OpenCode are now available in the AI SDK Harness The AI SDK Harness lets you run established coding-agent runtimes through one unified interface, so you can switch runtimes without changing your application code. Today we're adding two new adapters, Deep Agents and OpenCode, both running inside a Vercel Sandbox. Deep Agents… 27 Simon Willison community 5d ago simonw/browser-compat-db simonw/browser-compat-db Inspired by Mozilla's new MDN MCP service - source code here - I decided to try converting their comprehensive mdn/browser-compat-data repository full of browser compatibility data into a SQLite database. This new GitHub Repo includes a Claude Code for… 11 Hugging Face Daily Papers research 5d ago Critique of Agent Model Abstract True artificial agency requires internalized structures for goals, identity, decision-making, self-regulation, and learning, distinguishing autonomous systems from task-specific ones. Generated by Qwen/Qwen2.5-Coder-32B-Instruct What is an agent? What constitutes… 24 Latent.Space news-outlet 5d ago Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks In a rare double-interview, the Databricks technical leaders riff on what it will take for every company to build Agent Clouds 19 Google DeepMind official-blog 5d ago Introducing computer use in Gemini 3.5 Flash Introducing computer use in Gemini 3.5 Flash Jun 24, 2026 · Share x.com Facebook LinkedIn Mail Computer use is now a built-in tool in Gemini 3.5 Flash to build agents that can interact across platforms. Mateo Quiros Product Manager, Google DeepMind Share x.com Facebook LinkedIn… 9 r/MachineLearning community 5d ago I made a superhuman Generals.io agent with self-play RL [P] Hi everyone, I trained a self-play RL agent for Generals.io that reached superhuman-level and ranked #1 on the human 1v1 leaderboard. It began as my master's thesis where the goal was to beat a prior algorithm based agent. We succeeded using behavior cloning, RL fine-tuning and… 6 r/LocalLLaMA community 5d ago Nex-N2-Mini-Ultra-Uncensored-Heretic Is Out Now, an Agentic Model With Agentic Thinking Now Uncensored With 5/100 Refusals and 0.0020 KLD, Available in Safetensors and GGUF Formats! Safetensors: https://huggingface.co/llmfan46/Nex-N2-mini-ultra-uncensored-heretic GGUFs: https://huggingface.co/llmfan46/Nex-N2-mini-ultra-uncensored-heretic-GGUF Find all my models here: HuggingFace-LLMFan46 If you like my work and find my models useful, then I would really… 36 r/LocalLLaMA community 5d ago Qwen-AgentWorld-35B-A3B for Coding? Benchmark from its model card. Removed online models & Qwen-AgentWorld-397B-A17B from the table. Just Open models. Model MCP Search Term. SWE Android Web OS Overall DeepSeek-V4-Pro 63.27 27.61 51.26 59.44 55.17 50.32 63.70 52.97 GLM-5.1 67.60 22.46 47.32 52.07 59.10 51.50 59.13… 11 Hugging Face Daily Papers research 5d ago AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning Abstract Large language models face challenges in archive-grounded reasoning tasks involving evidence retrieval and synthesis across diverse document collections, with performance varying significantly across domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language… 26 Latent.Space news-outlet 5d ago [AINews] Claude Tag: Multiplayer, Proactive, Persistent Agents in Slack Claude finally gets a Slackbot upgrade 8 Hugging Face Daily Papers research 5d ago OpenThoughts-Agent: Data Recipes for Agentic Models Abstract An open-source data curation pipeline for training agentic language models is presented, demonstrating superior performance through systematic experimentation and scalable training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic language models dramatically… 34 Hugging Face Daily Papers research 5d ago LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis Abstract A large-scale multi-agent benchmark for evaluating LLMs in Chinese psychiatric diagnosis is introduced, highlighting challenges in dynamic consultation and the gap between consultation quality and diagnostic accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Mental… 36 r/LocalLLaMA community 6d ago Qwen-AgentWorld-35B-A3B: a 3B-active MoE trained to simulate MCP, terminal, SWE, Android, web and OS environments Qwen just released Qwen-AgentWorld-35B-A3B — a 35B-parameter MoE with only ~3B active parameters per token. The interesting part: this is not positioned as a standard chat/instruction model or a full autonomous agent. It is a language world model trained to predict what an… 6 r/LocalLLaMA community 6d ago GitHub - QwenLM/Qwen-AgentWorld: Qwen-AgentWorld: Language World Models for General Agents   submitted by   /u/dan945 [link]   [comments] 5 arXiv — Machine Learning research 6d ago Forget Without Compromise: Nexus Sampling for Streaming KV-Cache Eviction Under Fixed Budgets arXiv:2606.23961v1 Announce Type: new Abstract: Long-context and agentic LLM workloads push the KV cache past any fixed memory budget, forcing the inference stack to permanently evict tokens at every step of a continuous-inference stream. Existing methods all share the same… 20 arXiv — NLP / Computation & Language research 6d ago When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents arXiv:2606.23937v1 Announce Type: new Abstract: Exact-match retrieval recall is often used as a proxy for whether a retriever supplies useful policy context to a downstream decision model. We test this proxy for pre-action policy classification in tau-bench using Qwen2.5-3B/7B… 11 Page 3 of 10 · 500 articles ← Newer Older →