Tag

Agents + tool use

500 articles archived under #agents · RSS

arXiv — Machine Learning research 14d ago

Remember, Don't Re-read: Stateful ReAct Agents for Token-Efficient Autonomous Experimentation

arXiv:2606.14945v1 Announce Type: new Abstract: The autoresearch pattern enables autonomous experimentation by having a large language model (LLM) iteratively modify code to optimize a target metric. Its stateless design, however, reconstructs experimental context from scratch…

38
arXiv — Machine Learning research 14d ago

A Comparative Study of Graph Neural Network Layer Selection for Interaction Modelling in Driving Trajectory Prediction

arXiv:2606.14956v1 Announce Type: new Abstract: Autonomous driving systems rely on precise trajectory prediction to plan safe and efficient movement. Graph Neural Networks (GNNs) have become a promising approach for modelling spatiotemporal interactions among road agents.…

24
arXiv — Machine Learning research 14d ago

Edu-Theater: A Data-Efficient Agent Framework for Scalable Learner Behavior Simulation through Staging Roll-Call

arXiv:2606.15225v1 Announce Type: new Abstract: Large-scale learner-task interaction data are crucial for intelligent educational systems but are costly to collect and constrained by privacy and learner engagement. Learner simulators play a critical role in simulating scalable…

24
arXiv — Machine Learning research 14d ago

M-CTX: Exact and Scalable Spatial Context Retrieval for Trajectory Analytics

arXiv:2606.15244v1 Announce Type: new Abstract: Modern trajectory predictors increasingly condition on external spatial context, such as map geometry, signed distance fields (SDFs), and nearby moving agents. While this context improves prediction quality, constructing it for…

29
arXiv — Machine Learning research 14d ago

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

arXiv:2606.15306v1 Announce Type: new Abstract: We envision continually learning agentic systems that become more useful over time: as they encounter sequences of related tasks, they should infer the hidden structure shared across those tasks and use it to improve future…

19
arXiv — Machine Learning research 14d ago

Repeated Bilateral Trade: The Quest for Fairness

arXiv:2606.15369v1 Announce Type: new Abstract: We study repeated bilateral trade from a fairness perspective. At each round, a fresh seller-buyer pair arrives, and the platform posts a price before observing the traders' valuations. Trade occurs only if both agents accept the…

34
arXiv — Machine Learning research 14d ago

Multi-Agent Framework for Audit Risk Assessment with Explicit Uncertainty and Evidence Conflict Modeling

arXiv:2606.15640v1 Announce Type: new Abstract: Audit risk assessment increasingly benefits from combining heterogeneous evidence sources, yet existing approaches typically produce point predictions without quantifying how well different evidence streams agree. We propose UMAR…

29
arXiv — Machine Learning research 14d ago

On-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn Agents

arXiv:2606.15912v1 Announce Type: new Abstract: Multi-turn agents that plan, invoke tools, and interact with environments offer a promising paradigm for solving complex tasks, yet their capabilities typically rely on very large models whose inference cost is prohibitive in…

30
arXiv — NLP / Computation & Language research 14d ago

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

arXiv:2606.14832v1 Announce Type: new Abstract: Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers…

36
arXiv — NLP / Computation & Language research 14d ago

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

arXiv:2606.15007v1 Announce Type: new Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context…

19
arXiv — NLP / Computation & Language research 14d ago

Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

arXiv:2606.15017v1 Announce Type: new Abstract: Online web agents often augment a base actor with memory, workflow, or skill modules. These modules can improve performance, but they also consume test-time tokens, a cost rarely reported alongside the actor's inference cost. We…

11
arXiv — NLP / Computation & Language research 14d ago

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

arXiv:2606.15079v1 Announce Type: new Abstract: Efficient and scalable agentic intelligence requires models that can deliver both low-latency responses and strong reasoning capabilities while remaining practical to train, serve, and deploy. In this report, we present Ling-2.6…

16
arXiv — NLP / Computation & Language research 14d ago

Can Agents Read the Room? Benchmarking Visual Social Intelligence in Multimodal Simulation

arXiv:2606.15152v1 Announce Type: new Abstract: Social interaction depends on both language and visible social signals, such as facial expressions, posture, gaze, and emotional shifts. Yet existing social-agent benchmarks are largely text-based and rarely test whether multimodal…

10
arXiv — NLP / Computation & Language research 14d ago

Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations

arXiv:2606.15335v1 Announce Type: new Abstract: When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from distributional signatures such as formatting conventions, vocabulary choices, and…

17
arXiv — NLP / Computation & Language research 14d ago

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

arXiv:2606.15345v1 Announce Type: new Abstract: Deep research agents are increasingly evaluated on their ability to search for evidence, reason over retrieved sources, and produce grounded answers. Existing browsing benchmarks, however, largely assume that the user's query and…

21
arXiv — NLP / Computation & Language research 14d ago

Not All Skills Help: Measuring and Repairing Agent Knowledge

arXiv:2606.15390v1 Announce Type: new Abstract: LLM agents can improve without weight updates by accumulating natural-language skills from experience, but current systems entrust every decision about which skills to keep and how to apply them to LLM judgment alone. We argue that…

17
arXiv — NLP / Computation & Language research 14d ago

T-Mem: Memory That Anticipates, Not Archives

arXiv:2606.15405v1 Announce Type: new Abstract: Long-term memory is essential for conversational agents to remain coherent across extended dialogues, follow through on commitments made many sessions earlier, and adapt their behaviour to each user. Current LLM-backed long-term…

27
arXiv — NLP / Computation & Language research 14d ago

Pepti-Agent: An AI Agent for Peptide Design and Optimization

arXiv:2606.15422v1 Announce Type: new Abstract: Therapeutic peptides occupy a valuable design space between small molecules and biologics, but their development requires satisfying several competing constraints at once: solubility, hemolytic activity, and nonspecific surface…

6
arXiv — NLP / Computation & Language research 14d ago

Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations

arXiv:2606.15903v1 Announce Type: new Abstract: Where an LLM sits in an agent memory pipeline -- between the recall plane that retrieves stored facts (extensively benchmarked) and the control plane that mutates them via supersede, release, purge (largely untested) -- shapes…

21
arXiv — NLP / Computation & Language research 14d ago

Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search

arXiv:2606.15911v1 Announce Type: new Abstract: This paper focuses on automatically generating informative ad descriptions in sponsored search. Unlike ad titles which are usually optimized to attract user click feedbacks, ad descriptions have a longer text span and possess the…

8
arXiv — NLP / Computation & Language research 14d ago

GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science

arXiv:2606.16000v1 Announce Type: new Abstract: We introduce GRACE-DS, a Guarded Reward-guided Agent Correction Environment in Data Science for pre-deployment evaluation of LLM-powered AutoML agents. GRACE-DS is a set of evaluation metrics in an isolated environment that can be…

22
arXiv — NLP / Computation & Language research 14d ago

Towards Pareto-Optimal Tool-Integrated Agents with Pareto Ranking Policy Optimization

arXiv:2606.16111v1 Announce Type: new Abstract: Recent advances in tool-integrated language agents have significantly improved their ability to solve complex reasoning tasks. However, existing alignment methods predominantly focus on maximizing task accuracy, while overlooking…

10
arXiv — NLP / Computation & Language research 14d ago

PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

arXiv:2606.16215v1 Announce Type: new Abstract: Multi-turn tool-use agents must reason, call tools, and adapt to observations across several interaction turns. Post-training such agents is challenging, as reinforcement learning often suffers from sparse rewards and weak credit…

18
arXiv — NLP / Computation & Language research 14d ago

HiMPO: Hindsight-Informed Memory Policy Optimization for Less-Entangled Credit in Long-Horizon Agents

arXiv:2606.16285v1 Announce Type: new Abstract: Long-horizon agents rely on memory mechanisms to compress interaction history, but optimizing memory writing faces a distinct credit assignment challenge: a memory update may be rewarded or penalized due to downstream tool…

29
arXiv — NLP / Computation & Language research 14d ago

PathRouter: Aligning Rewards with Retrieval Quality in Agentic Graph Retrieval-Augmented Generation

arXiv:2606.16409v1 Announce Type: new Abstract: Agentic GraphRAG trains language-model agents to iteratively retrieve and reason over graph-structured evidence, enabling more accurate and context-aware decision-making by efficiently navigating complex information networks.…

30
arXiv — NLP / Computation & Language research 14d ago

Lect\=uraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

arXiv:2606.16428v1 Announce Type: new Abstract: Effective personalized AI-assisted learning demands systems that can not only generate accurate learner-specific educational materials, but also dynamically adapt their instruction to diverse learners. However, existing educational…

32
arXiv — NLP / Computation & Language research 14d ago

ACCORD: Action-Conditioned Contextual Grounding for Language Agents

arXiv:2606.16432v1 Announce Type: new Abstract: User instructions are often underspecified because humans rely on implicit assumptions about the surrounding environment. For large language model (LLM) agents operating in information-rich digital and physical environments, these…

16
Hugging Face Daily Papers research 14d ago

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Abstract Nemotron 3 Ultra is a large-scale language model featuring hybrid Mamba-Attention architecture with 550 billion parameters, achieving high inference throughput and extended context length through specialized training techniques. Generated by…

5
Hugging Face Daily Papers research 14d ago

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

Abstract Advanced agents struggle to effectively integrate data discovery with code execution in data-intensive environments, revealing a significant gap in current agentic capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced agents are increasingly demonstrating…

6
Hugging Face Daily Papers research 14d ago

FastContext: Training Efficient Repository Explorer for Coding Agents

Abstract FastContext separates repository exploration from code solving in LLM agents using specialized exploration models that reduce token consumption and improve resolution rates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Language Model (LLM) coding agents have…

19
Hugging Face Daily Papers research 14d ago

TokenPilot: Cache-Efficient Context Management for LLM Agents

Abstract TokenPilot is a dual-granularity context management framework that reduces inference costs in long-horizon LLM sessions by stabilizing prompt prefixes and conservatively managing context segments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLM agents are deployed…

28
Hugging Face Daily Papers research 14d ago

VisualClaw: A Real-Time, Personalized Agent for the Physical World

Abstract VisualClaw is a self-evolving multimodal agent that reduces deployment costs through hybrid encoding and skill evolution while improving video-QA accuracy across multiple benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision language models are serving as…

32
Vercel — AI dev-tools 14d ago

Vercel Sandbox can now run for up to 24 hours

Vercel Sandboxes can run uninterrupted sessions for up to 24 hours (up from 5 hours). This new max duration unlocks workloads that require longer runtimes, such as large-scale data processing, end-to-end testing pipelines, and long-lived agentic workflows. Pair with persistent…

23
r/LocalLLaMA community 14d ago

vLLM has a new streaming parser for Qwen3+ available in nightly

The new parser reportedly fixes the issues many were seeing with Qwen3.6-27b stopping mid turn, as well as failing streaming tool calls due to chunk boundaries. The mid turn stopping is especially annoying when trying to use the model for agentic workflows. I've not seen it…

22
r/LocalLLaMA community 14d ago

Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B

- "Qwen 3.6/3.5 27b > Qwen 3.6/3.5 35b > Gemma4 31b > Qwen 3.5 9b > Gemma4 12b > Gemma4 26b", people say - "Qwen 3.6 for coding & Agentic, Gemma4 for human sounding text", people say  So I have been eyeing the RTX 3090 24 GB (or sometimes its cheaper Chinese companion…

30
r/LocalLLaMA community 14d ago

Reason to run local agents instead #645

  submitted by   /u/ToastFetish [link]   [comments]

18
r/LocalLLaMA community 14d ago

I think we need a /LocalHarnessLLM or something ...

LM Studio Hermes Qwen Code Odysseus Open Claw Open Code Claude Code (and then IDEs w/ agentic capabilities) Continue Rider VS Code And a dozen others I'm sure ... Would love a place to discuss these? If not a new subreddit, a new discord section in localllama discord? I've made…

24
Simon Willison community 14d ago

datasette-apps 0.1a2

Release: datasette-apps 0.1a2 Custom network/CSP origins for apps are now guarded by a new apps-set-csp permission, with an optional allowed_csp_origins plugin allow-list for non-privileged users. The Datasette Agent app creation tool enforces the same rules. #24 Stored query…

15
Simon Willison community 14d ago

datasette-agent 0.3a0

Release: datasette-agent 0.3a0 New tool, execute_write_sql , which requests user approval and then writes to a database - taking user permissions into account. #27 I added a mechanism for asking user approval in datasette agent 0.2a0 . The new execute_write_sql tool can now…

11
r/LocalLLaMA community 14d ago

Local coding agents are good now, but only if you babysit them

Local coding agents are finally useful for me, but I still can’t just leave them alone. They are great for small fixes, reading a repo, changing files, and doing boring code work. But if I give them too much freedom, they start touching random stuff, making nice looking broken…

26
TechCrunch — AI news-outlet 14d ago

Salesforce acquires AI customer service platform Fin for $3.6 billion

Salesforce says it wants to use Fin's team and technology to improve Agentforce, its existing enterprise platform that businesses can use to build custom AI agents that automate tasks.

27
r/LocalLLaMA community 14d ago

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0)

archex turns a repo into a ranked, token-budgeted context bundle for coding agents: the symbols, imports, dependency-graph neighbors, and provenance the model needs, assembled before it reasons. It returns context, not an answer — your local model still does the thinking. The…

24
The Information — AI news-outlet 14d ago

Salesforce to Acquire Customer AI Agent Fin for $3.6 Billion

Salesforce has agreed to buy Fin, a startup that develops customer agents formerly known as Intercom, for $3.6 billion, as the software giant hopes to win new businesses from enterprises to adopt its own AI offering. The sale price is a big premium to Fin’s last valuation of $2…

18
TechCrunch — AI news-outlet 14d ago

As AI agents become employees, NewCore emerges with $66M to give them identities

NewCore argues the next challenge in enterprise security will be managing AI agents, not people.

26
Import AI news-outlet 14d ago

Import AI 461: "Alignment is not on track"; FrontierCode; and synthetic research interns

Where are your agents right now?

15
r/LocalLLaMA community 15d ago

An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig)

For the past couple of months, I've been building a tool for my personal use. I have a dual RTX 3090 system which I wanted to use but the qwen 3.5/3.6 27B and Gemma 4 31B while being really good, just didn't have the taste or the ability that a frontier model has. OTOH, frontier…

38
arXiv — Machine Learning research 15d ago

Utility-Constrained Policy Optimization

arXiv:2606.14029v1 Announce Type: new Abstract: Constrained MDPs (CMDPs) are a widely adopted framework for incorporating safety into RL agents; however, the framework does not support risk-sensitive constraints. This can be problematic: For example, CMDPs allow for optimal…

38
arXiv — Machine Learning research 15d ago

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

arXiv:2606.14130v1 Announce Type: new Abstract: Safe coordination problems surface in multi-agent reinforcement learning when global safety cannot be enforced by any agent unilaterally: the admissibility of one agent's action may depend on the dynamics of other agents.…

17
arXiv — NLP / Computation & Language research 15d ago

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

arXiv:2606.14155v1 Announce Type: cross Abstract: Context adaptation automates prompt engineering in LLM-based systems by iteratively revising tunable prompts from task feedback, without modifying model weights. Extending this paradigm to multi-LLM agentic systems is crucial:…

31
arXiv — Machine Learning research 15d ago

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

arXiv:2606.14397v1 Announce Type: new Abstract: As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications…

5

Remember, Don't Re-read: Stateful ReAct Agents for Token-Efficient Autonomous Experimentation

A Comparative Study of Graph Neural Network Layer Selection for Interaction Modelling in Driving Trajectory Prediction

Edu-Theater: A Data-Efficient Agent Framework for Scalable Learner Behavior Simulation through Staging Roll-Call

M-CTX: Exact and Scalable Spatial Context Retrieval for Trajectory Analytics

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

Repeated Bilateral Trade: The Quest for Fairness

Multi-Agent Framework for Audit Risk Assessment with Explicit Uncertainty and Evidence Conflict Modeling

On-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn Agents

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Can Agents Read the Room? Benchmarking Visual Social Intelligence in Multimodal Simulation

Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

Not All Skills Help: Measuring and Repairing Agent Knowledge

T-Mem: Memory That Anticipates, Not Archives

Pepti-Agent: An AI Agent for Peptide Design and Optimization

Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations

Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search

GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science

Towards Pareto-Optimal Tool-Integrated Agents with Pareto Ranking Policy Optimization

PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

HiMPO: Hindsight-Informed Memory Policy Optimization for Less-Entangled Credit in Long-Horizon Agents

PathRouter: Aligning Rewards with Retrieval Quality in Agentic Graph Retrieval-Augmented Generation

Lect\=uraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

ACCORD: Action-Conditioned Contextual Grounding for Language Agents

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

FastContext: Training Efficient Repository Explorer for Coding Agents

TokenPilot: Cache-Efficient Context Management for LLM Agents

VisualClaw: A Real-Time, Personalized Agent for the Physical World

Vercel Sandbox can now run for up to 24 hours

vLLM has a new streaming parser for Qwen3+ available in nightly

Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B

Reason to run local agents instead #645

I think we need a /LocalHarnessLLM or something ...

datasette-apps 0.1a2

datasette-agent 0.3a0

Local coding agents are good now, but only if you babysit them

Salesforce acquires AI customer service platform Fin for $3.6 billion

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0)

Salesforce to Acquire Customer AI Agent Fin for $3.6 Billion

As AI agents become employees, NewCore emerges with $66M to give them identities

Import AI 461: "Alignment is not on track"; FrontierCode; and synthetic research interns

An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig)

Utility-Constrained Policy Optimization

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments