News / #agents Tag Agents + tool use 500 articles archived under #agents · RSS Sign in to follow arXiv — Machine Learning research 14d ago Remember, Don't Re-read: Stateful ReAct Agents for Token-Efficient Autonomous Experimentation arXiv:2606.14945v1 Announce Type: new Abstract: The autoresearch pattern enables autonomous experimentation by having a large language model (LLM) iteratively modify code to optimize a target metric. Its stateless design, however, reconstructs experimental context from scratch… 38 arXiv — Machine Learning research 14d ago A Comparative Study of Graph Neural Network Layer Selection for Interaction Modelling in Driving Trajectory Prediction arXiv:2606.14956v1 Announce Type: new Abstract: Autonomous driving systems rely on precise trajectory prediction to plan safe and efficient movement. Graph Neural Networks (GNNs) have become a promising approach for modelling spatiotemporal interactions among road agents.… 24 arXiv — Machine Learning research 14d ago Edu-Theater: A Data-Efficient Agent Framework for Scalable Learner Behavior Simulation through Staging Roll-Call arXiv:2606.15225v1 Announce Type: new Abstract: Large-scale learner-task interaction data are crucial for intelligent educational systems but are costly to collect and constrained by privacy and learner engagement. Learner simulators play a critical role in simulating scalable… 24 arXiv — Machine Learning research 14d ago M-CTX: Exact and Scalable Spatial Context Retrieval for Trajectory Analytics arXiv:2606.15244v1 Announce Type: new Abstract: Modern trajectory predictors increasingly condition on external spatial context, such as map geometry, signed distance fields (SDFs), and nearby moving agents. While this context improves prediction quality, constructing it for… 29 arXiv — Machine Learning research 14d ago LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure arXiv:2606.15306v1 Announce Type: new Abstract: We envision continually learning agentic systems that become more useful over time: as they encounter sequences of related tasks, they should infer the hidden structure shared across those tasks and use it to improve future… 19 arXiv — Machine Learning research 14d ago Repeated Bilateral Trade: The Quest for Fairness arXiv:2606.15369v1 Announce Type: new Abstract: We study repeated bilateral trade from a fairness perspective. At each round, a fresh seller-buyer pair arrives, and the platform posts a price before observing the traders' valuations. Trade occurs only if both agents accept the… 34 arXiv — Machine Learning research 14d ago Multi-Agent Framework for Audit Risk Assessment with Explicit Uncertainty and Evidence Conflict Modeling arXiv:2606.15640v1 Announce Type: new Abstract: Audit risk assessment increasingly benefits from combining heterogeneous evidence sources, yet existing approaches typically produce point predictions without quantifying how well different evidence streams agree. We propose UMAR… 29 arXiv — Machine Learning research 14d ago On-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn Agents arXiv:2606.15912v1 Announce Type: new Abstract: Multi-turn agents that plan, invoke tools, and interact with environments offer a promising paradigm for solving complex tasks, yet their capabilities typically rely on very large models whose inference cost is prohibitive in… 30 arXiv — NLP / Computation & Language research 14d ago PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions arXiv:2606.14832v1 Announce Type: new Abstract: Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers… 36 arXiv — NLP / Computation & Language research 14d ago Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning arXiv:2606.15007v1 Announce Type: new Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context… 19 arXiv — NLP / Computation & Language research 14d ago Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents arXiv:2606.15017v1 Announce Type: new Abstract: Online web agents often augment a base actor with memory, workflow, or skill modules. These modules can improve performance, but they also consume test-time tokens, a cost rarely reported alongside the actor's inference cost. We… 11 arXiv — NLP / Computation & Language research 14d ago Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale arXiv:2606.15079v1 Announce Type: new Abstract: Efficient and scalable agentic intelligence requires models that can deliver both low-latency responses and strong reasoning capabilities while remaining practical to train, serve, and deploy. In this report, we present Ling-2.6… 16 arXiv — NLP / Computation & Language research 14d ago Can Agents Read the Room? Benchmarking Visual Social Intelligence in Multimodal Simulation arXiv:2606.15152v1 Announce Type: new Abstract: Social interaction depends on both language and visible social signals, such as facial expressions, posture, gaze, and emotional shifts. Yet existing social-agent benchmarks are largely text-based and rarely test whether multimodal… 10 arXiv — NLP / Computation & Language research 14d ago Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations arXiv:2606.15335v1 Announce Type: new Abstract: When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from distributional signatures such as formatting conventions, vocabulary choices, and… 17 arXiv — NLP / Computation & Language research 14d ago Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus arXiv:2606.15345v1 Announce Type: new Abstract: Deep research agents are increasingly evaluated on their ability to search for evidence, reason over retrieved sources, and produce grounded answers. Existing browsing benchmarks, however, largely assume that the user's query and… 21 arXiv — NLP / Computation & Language research 14d ago Not All Skills Help: Measuring and Repairing Agent Knowledge arXiv:2606.15390v1 Announce Type: new Abstract: LLM agents can improve without weight updates by accumulating natural-language skills from experience, but current systems entrust every decision about which skills to keep and how to apply them to LLM judgment alone. We argue that… 17 arXiv — NLP / Computation & Language research 14d ago T-Mem: Memory That Anticipates, Not Archives arXiv:2606.15405v1 Announce Type: new Abstract: Long-term memory is essential for conversational agents to remain coherent across extended dialogues, follow through on commitments made many sessions earlier, and adapt their behaviour to each user. Current LLM-backed long-term… 27 arXiv — NLP / Computation & Language research 14d ago Pepti-Agent: An AI Agent for Peptide Design and Optimization arXiv:2606.15422v1 Announce Type: new Abstract: Therapeutic peptides occupy a valuable design space between small molecules and biologics, but their development requires satisfying several competing constraints at once: solubility, hemolytic activity, and nonspecific surface… 6 arXiv — NLP / Computation & Language research 14d ago Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations arXiv:2606.15903v1 Announce Type: new Abstract: Where an LLM sits in an agent memory pipeline -- between the recall plane that retrieves stored facts (extensively benchmarked) and the control plane that mutates them via supersede, release, purge (largely untested) -- shapes… 21 arXiv — NLP / Computation & Language research 14d ago Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search arXiv:2606.15911v1 Announce Type: new Abstract: This paper focuses on automatically generating informative ad descriptions in sponsored search. Unlike ad titles which are usually optimized to attract user click feedbacks, ad descriptions have a longer text span and possess the… 8 arXiv — NLP / Computation & Language research 14d ago GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science arXiv:2606.16000v1 Announce Type: new Abstract: We introduce GRACE-DS, a Guarded Reward-guided Agent Correction Environment in Data Science for pre-deployment evaluation of LLM-powered AutoML agents. GRACE-DS is a set of evaluation metrics in an isolated environment that can be… 22 arXiv — NLP / Computation & Language research 14d ago Towards Pareto-Optimal Tool-Integrated Agents with Pareto Ranking Policy Optimization arXiv:2606.16111v1 Announce Type: new Abstract: Recent advances in tool-integrated language agents have significantly improved their ability to solve complex reasoning tasks. However, existing alignment methods predominantly focus on maximizing task accuracy, while overlooking… 10 arXiv — NLP / Computation & Language research 14d ago PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents arXiv:2606.16215v1 Announce Type: new Abstract: Multi-turn tool-use agents must reason, call tools, and adapt to observations across several interaction turns. Post-training such agents is challenging, as reinforcement learning often suffers from sparse rewards and weak credit… 18 arXiv — NLP / Computation & Language research 14d ago HiMPO: Hindsight-Informed Memory Policy Optimization for Less-Entangled Credit in Long-Horizon Agents arXiv:2606.16285v1 Announce Type: new Abstract: Long-horizon agents rely on memory mechanisms to compress interaction history, but optimizing memory writing faces a distinct credit assignment challenge: a memory update may be rewarded or penalized due to downstream tool… 29 arXiv — NLP / Computation & Language research 14d ago PathRouter: Aligning Rewards with Retrieval Quality in Agentic Graph Retrieval-Augmented Generation arXiv:2606.16409v1 Announce Type: new Abstract: Agentic GraphRAG trains language-model agents to iteratively retrieve and reason over graph-structured evidence, enabling more accurate and context-aware decision-making by efficiently navigating complex information networks.… 30 arXiv — NLP / Computation & Language research 14d ago Lect\=uraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching arXiv:2606.16428v1 Announce Type: new Abstract: Effective personalized AI-assisted learning demands systems that can not only generate accurate learner-specific educational materials, but also dynamically adapt their instruction to diverse learners. However, existing educational… 32 arXiv — NLP / Computation & Language research 14d ago ACCORD: Action-Conditioned Contextual Grounding for Language Agents arXiv:2606.16432v1 Announce Type: new Abstract: User instructions are often underspecified because humans rely on implicit assumptions about the surrounding environment. For large language model (LLM) agents operating in information-rich digital and physical environments, these… 16 Hugging Face Daily Papers research 14d ago Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Abstract Nemotron 3 Ultra is a large-scale language model featuring hybrid Mamba-Attention architecture with 550 billion parameters, achieving high inference throughput and extended context length through specialized training techniques. Generated by… 5 Hugging Face Daily Papers research 14d ago CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks? Abstract Advanced agents struggle to effectively integrate data discovery with code execution in data-intensive environments, revealing a significant gap in current agentic capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced agents are increasingly demonstrating… 6 Hugging Face Daily Papers research 14d ago FastContext: Training Efficient Repository Explorer for Coding Agents Abstract FastContext separates repository exploration from code solving in LLM agents using specialized exploration models that reduce token consumption and improve resolution rates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Language Model (LLM) coding agents have… 19 Hugging Face Daily Papers research 14d ago TokenPilot: Cache-Efficient Context Management for LLM Agents Abstract TokenPilot is a dual-granularity context management framework that reduces inference costs in long-horizon LLM sessions by stabilizing prompt prefixes and conservatively managing context segments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLM agents are deployed… 28 Hugging Face Daily Papers research 14d ago VisualClaw: A Real-Time, Personalized Agent for the Physical World Abstract VisualClaw is a self-evolving multimodal agent that reduces deployment costs through hybrid encoding and skill evolution while improving video-QA accuracy across multiple benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision language models are serving as… 32 Vercel — AI dev-tools 14d ago Vercel Sandbox can now run for up to 24 hours Vercel Sandboxes can run uninterrupted sessions for up to 24 hours (up from 5 hours). This new max duration unlocks workloads that require longer runtimes, such as large-scale data processing, end-to-end testing pipelines, and long-lived agentic workflows. Pair with persistent… 23 r/LocalLLaMA community 14d ago vLLM has a new streaming parser for Qwen3+ available in nightly The new parser reportedly fixes the issues many were seeing with Qwen3.6-27b stopping mid turn, as well as failing streaming tool calls due to chunk boundaries. The mid turn stopping is especially annoying when trying to use the model for agentic workflows. I've not seen it… 22 r/LocalLLaMA community 14d ago Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B - "Qwen 3.6/3.5 27b > Qwen 3.6/3.5 35b > Gemma4 31b > Qwen 3.5 9b > Gemma4 12b > Gemma4 26b", people say - "Qwen 3.6 for coding & Agentic, Gemma4 for human sounding text", people say ​ So I have been eyeing the RTX 3090 24 GB (or sometimes its cheaper Chinese companion… 30 r/LocalLLaMA community 14d ago Reason to run local agents instead #645   submitted by   /u/ToastFetish [link]   [comments] 18 r/LocalLLaMA community 14d ago I think we need a /LocalHarnessLLM or something ... LM Studio Hermes Qwen Code Odysseus Open Claw Open Code Claude Code (and then IDEs w/ agentic capabilities) Continue Rider VS Code And a dozen others I'm sure ... Would love a place to discuss these? If not a new subreddit, a new discord section in localllama discord? I've made… 24 Simon Willison community 14d ago datasette-apps 0.1a2 Release: datasette-apps 0.1a2 Custom network/CSP origins for apps are now guarded by a new apps-set-csp permission, with an optional allowed_csp_origins plugin allow-list for non-privileged users. The Datasette Agent app creation tool enforces the same rules. #24 Stored query… 15 Simon Willison community 14d ago datasette-agent 0.3a0 Release: datasette-agent 0.3a0 New tool, execute_write_sql , which requests user approval and then writes to a database - taking user permissions into account. #27 I added a mechanism for asking user approval in datasette agent 0.2a0 . The new execute_write_sql tool can now… 11 r/LocalLLaMA community 14d ago Local coding agents are good now, but only if you babysit them Local coding agents are finally useful for me, but I still can’t just leave them alone. They are great for small fixes, reading a repo, changing files, and doing boring code work. But if I give them too much freedom, they start touching random stuff, making nice looking broken… 26 TechCrunch — AI news-outlet 14d ago Salesforce acquires AI customer service platform Fin for $3.6 billion Salesforce says it wants to use Fin's team and technology to improve Agentforce, its existing enterprise platform that businesses can use to build custom AI agents that automate tasks. 27 r/LocalLLaMA community 14d ago archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0) archex turns a repo into a ranked, token-budgeted context bundle for coding agents: the symbols, imports, dependency-graph neighbors, and provenance the model needs, assembled before it reasons. It returns context, not an answer — your local model still does the thinking. The… 24 The Information — AI news-outlet 14d ago Salesforce to Acquire Customer AI Agent Fin for $3.6 Billion Salesforce has agreed to buy Fin, a startup that develops customer agents formerly known as Intercom, for $3.6 billion, as the software giant hopes to win new businesses from enterprises to adopt its own AI offering. The sale price is a big premium to Fin’s last valuation of $2… 18 TechCrunch — AI news-outlet 14d ago As AI agents become employees, NewCore emerges with $66M to give them identities NewCore argues the next challenge in enterprise security will be managing AI agents, not people. 26 Import AI news-outlet 14d ago Import AI 461: "Alignment is not on track"; FrontierCode; and synthetic research interns Where are your agents right now? 15 r/LocalLLaMA community 15d ago An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig) For the past couple of months, I've been building a tool for my personal use. I have a dual RTX 3090 system which I wanted to use but the qwen 3.5/3.6 27B and Gemma 4 31B while being really good, just didn't have the taste or the ability that a frontier model has. OTOH, frontier… 38 arXiv — Machine Learning research 15d ago Utility-Constrained Policy Optimization arXiv:2606.14029v1 Announce Type: new Abstract: Constrained MDPs (CMDPs) are a widely adopted framework for incorporating safety into RL agents; however, the framework does not support risk-sensitive constraints. This can be problematic: For example, CMDPs allow for optimal… 38 arXiv — Machine Learning research 15d ago Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning arXiv:2606.14130v1 Announce Type: new Abstract: Safe coordination problems surface in multi-agent reinforcement learning when global safety cannot be enforced by any agent unilaterally: the admissibility of one agent's action may depend on the dynamics of other agents.… 17 arXiv — NLP / Computation & Language research 15d ago Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems arXiv:2606.14155v1 Announce Type: cross Abstract: Context adaptation automates prompt engineering in LLM-based systems by iteratively revising tunable prompts from task feedback, without modifying model weights. Extending this paradigm to multi-LLM agentic systems is crucial:… 31 arXiv — Machine Learning research 15d ago Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments arXiv:2606.14397v1 Announce Type: new Abstract: As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications… 5 Page 8 of 10 · 500 articles ← Newer Older →