News / #agents Tag Agents + tool use 500 articles archived under #agents · RSS Sign in to follow Hugging Face Daily Papers research 7d ago PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Abstract PlanBench-XL evaluates large language model agents' ability to plan and adapt in complex tool-rich environments with limited visibility and dynamic disruptions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents increasingly operate in large tool ecosystems, where… 10 Hugging Face Daily Papers research 7d ago EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory Abstract EvoEmbedding is a dynamic embedding model that generates adaptive representations by maintaining a continuously updated latent memory, enabling improved retrieval performance in long-context scenarios. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing embedding… 32 Hugging Face Daily Papers research 7d ago DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks Abstract Search agents face challenges in real-world evaluation due to limited benchmarks and coarse metrics, necessitating more nuanced assessment approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Search Agents (SAs) typically leverage large language models (LLMs) to… 14 Hugging Face Daily Papers research 7d ago CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks Abstract Calibrated verifier telemetry enhances LLM agents in knowledge-intensive question answering by providing confidence scores and grounding verification, reducing both over-retrieval and unsupported answers. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents in… 7 Hugging Face Daily Papers research 7d ago Deep Research in Physical Sciences: A Multi-Agent Framework and Comprehensive Benchmark Abstract PhySciBench benchmark reveals limited performance of current LLM agents in physical science research, leading to development of DelveAgent framework that improves accuracy through modular design and physics-grounded mechanisms. Generated by… 5 r/LocalLLaMA community 7d ago Why is NO one talking about Microsoft's open source Fast Context!!! https://huggingface.co/microsoft/FastContext-1.0-4B-SFT https://github.com/microsoft/fastcontext FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext… 38 TechCrunch — AI news-outlet 7d ago The AI world is getting ‘loopy’ The loop takes agentic AI a step further, by authorizing a swarm of agents to work continuously in the background, endlessly. 28 r/LocalLLaMA community 7d ago TMax: A Simple Recipe for Terminal Agents TMax is the strongest open RL recipe for terminal agents to date, bringing open data recipes closer to the frontier. We release two things. The first is TMax-15k , a dataset of 14,600 RL environments built from a compositional pipeline with explicit control over difficulty and… 22 Interconnects (Nathan Lambert) research 7d ago GLM-5.2 is the step change for open agents A capability threshold I've been carefully monitoring. 12 r/LocalLLaMA community 7d ago Same model, same prompt, 4 different agents Setup: one self-hosted Qwen3.6-27B (Q4) on llama.cpp, identical prompt, identical hardware. The only variable is the agent scaffolding. Agents tested: pi, opencode, hermes, qwen code . Task: a single-file 2D canvas solar system with scripted orbits and gravity that acts only on… 14 Vercel — AI dev-tools 7d ago Chat SDK adds Novu support Chat SDK now supports Novu with the new vendor-official adapter . One handler set puts your agent on Slack, Microsoft Teams, WhatsApp, Telegram, and email. Novu handles credentials, identity, and delivery, keeping OAuth and tokens outside your app and mapping each channel to one… 32 r/LocalLLaMA community 7d ago Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale arXiv : https://arxiv.org/abs/2606.15079 Full Paper : https://arxiv.org/pdf/2606.15079 HuggingFace : https://huggingface.co/inclusionAI/models?sort=created (This month they released base models for both Ling-2.6-1T & Ling-2.6-flash ) -------------------------- Wish they released… 11 r/LocalLLaMA community 8d ago I want to love hermes agent, but it looks so ugly, and ux is not nice I am rechecking on hermes agent currently, also because many report great experiences, but oh my, does it look ugly. The web-UI uses such ugly fonts and background graphics, and for some reasons, UX feel slow and tedious (even in the tui). Pi mono agent feels quick and fast… 20 Hugging Face Daily Papers research 8d ago GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents Abstract Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory benchmarks for… 5 Hugging Face Daily Papers research 8d ago WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents Abstract WorldLines benchmark evaluates long-term memory in embodied agents through household scenarios, while ObsMem framework addresses challenges in partial observability and memory translation for decision-making. Generated by Qwen/Qwen2.5-Coder-32B-Instruct To assist humans… 19 Vercel — AI dev-tools 8d ago Sakana Fugu Ultra now available on AI Gateway Sakana Fugu Ultra from Sakana AI is now available on AI Gateway . Fugu Ultra is built on a pool of publicly accessible frontier models, rather than running as a single model. It coordinates several models, routing work to 1-3 agents depending on the problem and combining their… 31 Simon Willison community 8d ago Temporary Cloudflare Accounts for AI agents Temporary Cloudflare Accounts for AI agents The announcement says this is "for AI agents" but (as is pretty common these days) the AI hook isn't really necessary, this is an interesting feature for everyone else as well. Short version: you can now create a Cloudflare Workers… 16 r/LocalLLaMA community 8d ago I pretrained and post trained a 500M parameter LLM and 330M parameter Image generator from scratch Hey folks Hope you are doing well I started HobbyLM as an side project last month Initially I wrote an Agent harness using Claude SDK which takes notes on various LLM architecture does ablation studies to find optimised or well fit architecture for this model training then I… 16 r/LocalLLaMA community 8d ago Sandboxing code execution for AI agents For those giving their agents the ability to execute code, how are you sandboxing it? The spectrum seems to be: Docker containers: familiar, decent isolation, but heavyweight for per-request sandboxing microVMs: great isolation, fast boot, but operational complexity WASM:… 5 r/LocalLLaMA community 8d ago 8-16 MI50s Minimax M3 @19 tps TG (peak) TL;DR Speeds are not too ugly for this old 2018 hardware but imo, not very usable for agentic coding (if you compare with qwen3.6 27B on 8 MI50 @ 50 tps TG 800 tps PP). More concerning is that the reasoning output is very very long and still didn’t check about the quality of… 27 r/LocalLLaMA community 8d ago I mapped every agent config file (AGENTS.md, CLAUDE.md, llms.txt, .cursorrules, SKILL.md...) and tagged how widely each is actually used Every tool ships its own magic file now and after a while the names all blur together. I put together a guide to the ones agents actually read and write, with a tag on each for real adoption instead of hype. https://github.com/ItamarZand88/awesome-agent-conventions 21… 22 r/LocalLLaMA community 9d ago Board where every tile is an agent I've been hacking a project which I find extremely useful and wanted to share. Imagine a board where every tile is an agent those job is to maintain the tile. I tried to illustrate the idea with a video here. The project is open source on GitHub and you can also try it out here… 36 Hacker News — AI on Front Page community 9d ago Temporary Cloudflare accounts for AI agents Article URL: https://blog.cloudflare.com/temporary-accounts/ Comments URL: https://news.ycombinator.com/item?id=48608394 Points: 203 # Comments: 106 15 r/LocalLLaMA community 10d ago Local AI for local office files Which AI agent do you think is the best for working with local files (Excel, PDF, Word, txt, json, etc.)? What have you used for this? What workflows have you implemented?   submitted by   /u/Holiday-Display509 [link]   [comments] 29 r/LocalLLaMA community 10d ago Giving a local agent web access without paid search/scrape APIs: SearXNG + Scrapling I wanted web access for a local-first agent without reaching for Tavily, Serper, Firecrawl, etc. For this agent path, I wanted no paid API keys, a search service I control, and page extraction I can run myself. What I ended up with is two tools: web_search and web_extract .… 6 r/LocalLLaMA community 10d ago Local agent on 4090 - looking for LM Studio settings I have moved on from Ollama to just dink around and instead want to start running a local agent from time to time. With the 24GB of a 4090 (Gigabyte OC edition) that should be quite possible. But no matter what settings I use for context and batching, token generation is slow as… 36 Simon Willison community 10d ago Quoting Sean Lynch The real valuable capability MCP offers over skills/CLI is isolating the auth flow outside of the agent’s context window, and potentially out of the harness completely. [...] Maybe the idealized form of MCP is just an auth gateway for the API and nothing else. That’d still be a… 8 r/LocalLLaMA community 10d ago Best Local Agents - Jun 2026 A megathread that is overdue! Let's discuss and debate on what the best local agents available today are Prologue First a note on terminology: While most regular users are going to have a general sense of what these are, I think its worth a brief pause to preempt turbulence in… 6 Hugging Face Daily Papers research 10d ago LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents Abstract LEDGERAGENT is a method for customer service agents that maintains task states in a separate ledger to improve policy adherence and state management during tool calling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Policy-adherent tool-calling agents in customer-service… 36 GitHub Blog — AI & ML official-blog 10d ago How we built an internal data analytics agent Qubot, our internal Copilot-powered analytics agent, allows any GitHub employee to ask questions about our data in plain language. Here's what we learned as we built it. The post How we built an internal data analytics agent appeared first on The GitHub Blog . 18 Hugging Face Daily Papers research 10d ago Context-Aware RL for Agentic and Multimodal LLMs Abstract ContextRL enhances long-horizon reasoning and multimodal performance through reinforcement learning that rewards context selection for supporting query-answer pairs, achieving improvements over standard methods on diverse benchmarks. Generated by… 21 r/LocalLLaMA community 10d ago Improving local models with an API based "consultant"? I'm sure that someone else has come up with this before, but i just wanted to ask: Has it occurred to anyone to improve their local AI workflow by adding a more powerful API based "consultant" agent (GLM 5.2 now springs to mind) to call upon for refining plans, learnings and… 35 Hugging Face Daily Papers research 10d ago Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why Abstract ACIE, an agentic RAG system deployed in a clinical setting, demonstrates high accuracy in extracting medical information from complex patient contexts, achieving 96.5% acceptance rate by nuclear-medicine physicians across 7,326 judgments. Generated by… 5 r/LocalLLaMA community 10d ago Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) I wanted to find the exact floor for running an intelligent, local voice assistant agent on consumer hardware. I kept the environment, tools, and prompts identical, I stepped the model sizes down through Qwen 3.5 9B, 4B, 2B, and 0.8B to see how agentic reasoning degrades. The… 12 r/LocalLLaMA community 10d ago New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts You can read about it here: https://artificialanalysis.ai/articles/aa-briefcase This is a solid benchmark from Artificial Analysis. It basically tests an LLMs ability to plan and execute tasks. And more importantly, it is a new benchmark that is not saturated, so no one can… 32 r/LocalLLaMA community 10d ago Researchers trained a Deep Research agent with 32 H100s and open-sourced everything Ohio State University's NLP team released QUEST-35B, an open-source Deep Research agent trained using ~32 H100s and ~8K synthetic samples. The team open-sourced the training recipe, code, weights and datasets. Benchmark results show competitive performance against several… 13 Hugging Face Daily Papers research 10d ago ENPIRE: Agentic Robot Policy Self-Improvement in the Real World Abstract ENPIRE framework enables autonomous robotics research through a closed-loop system that automates policy improvement via environment feedback, policy refinement, and evolutionary code optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Achieving dexterous robotic… 27 Hugging Face Daily Papers research 11d ago Playful Agentic Robot Learning Abstract Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current agentic robot systems can write… 4 arXiv — Machine Learning research 11d ago MortarBench: Evaluating Mortgage Loan Origination Agents arXiv:2606.19416v1 Announce Type: new Abstract: Loan origination is the process by which a lender creates a new loan, from application and underwriting through approval and funding. This process serves a critical role in evaluating the eligibility and level of risk posed by an… 15 arXiv — Machine Learning research 11d ago IHBench: Evaluating Post-Interruption Recovery in Voice Agents with Structured Workflows arXiv:2606.19595v1 Announce Type: new Abstract: Voice agents deployed in structured workflows (customer service, healthcare scheduling, account management) must handle frequent user interruptions while maintaining progress through multi-step procedures. Existing benchmarks for… 35 arXiv — Machine Learning research 11d ago OnDeFog: Online Decision Transformer under Frame Dropping arXiv:2606.19721v1 Announce Type: new Abstract: In challenging real-world reinforcement learning applications, communication delays or sensor failures often cause frame dropping, in which the agent cannot receive the dropped states and associated rewards. To address the… 20 arXiv — NLP / Computation & Language research 11d ago Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning arXiv:2606.20002v1 Announce Type: cross Abstract: This work presents a general framework for training large language models (LLMs) to "Connect the Dots" (CoD), a meta-capability required by long-lifecycle agents: as an LLM-based AI agent gets deployed in an environment, it… 13 arXiv — NLP / Computation & Language research 11d ago SAGE-OPD: Selective Agent-Guided Intervention for Multi-Turn On-Policy Distillation arXiv:2606.19659v1 Announce Type: new Abstract: On-policy distillation (OPD) improves student models by training them on trajectories induced by their own policy, making it a promising approach for mitigating exposure bias in agent training. However, most OPD studies focus on… 17 arXiv — NLP / Computation & Language research 11d ago AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts arXiv:2606.19847v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate strong reasoning and generation abilities, but their fixed context windows limit long-term information accumulation and reuse across multi-session interactions. Existing memory-augmented… 32 arXiv — NLP / Computation & Language research 11d ago Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives arXiv:2606.19852v1 Announce Type: new Abstract: Information extraction from pathology reports is essential for cancer staging, tumor registry population. Yet key data remains embedded in narrative reports, making manual extraction labor-intensive and error-prone. Traditional… 26 arXiv — NLP / Computation & Language research 11d ago When Does Streaming Tool Use Help? Characterizing Tool-Intent Stabilization in Streaming Retrieval-Augmented Generation arXiv:2606.20113v1 Announce Type: new Abstract: Streaming Retrieval-Augmented Generation (Streaming RAG) reduces user-perceived latency by issuing tool queries in parallel with ongoing user input, before the utterance is complete. Reported gains are aggregate, yet the… 21 arXiv — NLP / Computation & Language research 11d ago Beyond Global Replanning: Hierarchical Recovery for Cross-Device Agent Systems arXiv:2606.20487v1 Announce Type: new Abstract: Real-world computer-use tasks often span multiple applications and devices, requiring agents to coordinate heterogeneous environments under dynamic runtime failures. Existing multi-device agent systems support task decomposition… 16 arXiv — NLP / Computation & Language research 11d ago Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen? arXiv:2606.19388v1 Announce Type: cross Abstract: Recent advances in mobile agents are dominated by the GUI paradigm, in which agents perceive UI information and emit screen interactions. However, mobile platforms also expose a command-line interface (CLI) that provides direct… 31 arXiv — NLP / Computation & Language research 11d ago DeXposure-Claw: An Agentic System for DeFi Risk Supervision arXiv:2606.19501v1 Announce Type: cross Abstract: Decentralized finance exposes supervisors to fast-moving, networked credit risks. General-purpose LLM agents fit this setting poorly: they over-read weak evidence and recommend high-stakes interventions, while existing… 14 arXiv — NLP / Computation & Language research 11d ago Uncertainty Decomposition for Clarification Seeking in LLM Agents arXiv:2606.19559v1 Announce Type: cross Abstract: Recent position papers argue that the classical aleatoric/epistemic uncertainty framework is insufficient for interactive large language model (LLM) agents and call for underspecification-aware, decomposed, and communicable… 9 Page 5 of 10 · 500 articles ← Newer Older →