Hugging Face Daily Papers · · 8 min read

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dimension of tool use, especially the impact of tool response latency, and are usually limited to single-task settings. In real-world applications, multiple tasks often need to be executed concurrently, and overall efficiency depends on whether an agent can use idle time while waiting for tool responses. We refer to this capability as asynchronous tool calling. To evaluate it, we propose AsyncTool, a benchmark for assessing LLM-based agents in interactive multi-task tool-use environments with delayed tool feedback. AsyncTool presents multiple heterogeneous tasks simultaneously and simulates realistic tool response latency during execution. Using a hybrid data evolution strategy, we construct a diverse asynchronous multitasking dataset that covers multiple scenarios and tool-use patterns. We evaluate models at the step, sub-task, and task levels, and introduce efficiency-oriented metrics to measure task coordination and completion efficiency. Extensive experiments show that delayed tool feedback poses substantial challenges to current agents and leads to clear performance degradation. Models that better coordinate task switching, dependency tracking, and state maintenance achieve stronger performance on AsyncTool. Our analysis identifies key failure modes of current tool-using agents and provides practical insights for designing future systems with stronger temporal reasoning and coordination capabilities.</p>\n","updatedAt":"2026-05-29T01:43:02.432Z","author":{"_id":"68a7d843ac4c72b9877b54b5","avatarUrl":"/avatars/7648ed4dbb4f3a209a8f62af7803b85f.svg","fullname":"Kou Shi","name":"KouShi2","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8854766488075256},"editors":["KouShi2"],"editorAvatarUrls":["/avatars/7648ed4dbb4f3a209a8f62af7803b85f.svg"],"reactions":[{"reaction":"🔥","users":["KouShi2"],"count":1}],"isReport":false}},{"id":"6a1a40c009f8f8e707f722e7","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:43:28.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning](https://huggingface.co/papers/2605.09544) (2026)\n* [UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents](https://huggingface.co/papers/2604.11557) (2026)\n* [GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows](https://huggingface.co/papers/2604.15715) (2026)\n* [Do Agents Know What They Can't Do? Evaluating Feasibility Awareness in Tool-Using Agents](https://huggingface.co/papers/2605.28532) (2026)\n* [Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs](https://huggingface.co/papers/2605.15077) (2026)\n* [AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents](https://huggingface.co/papers/2605.07926) (2026)\n* [UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization](https://huggingface.co/papers/2604.13822) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.09544\">TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11557\">UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.15715\">GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.28532\">Do Agents Know What They Can't Do? Evaluating Feasibility Awareness in Tool-Using Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15077\">Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.07926\">AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.13822\">UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:43:28.535Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7328113913536072},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.27995","authors":[{"_id":"6a17a31ab4840f905a674268","user":{"_id":"68a7d843ac4c72b9877b54b5","avatarUrl":"/avatars/7648ed4dbb4f3a209a8f62af7803b85f.svg","isPro":false,"fullname":"Kou Shi","user":"KouShi2","type":"user","name":"KouShi2"},"name":"Kou Shi","status":"claimed_verified","statusLastChangedAt":"2026-05-28T15:30:26.025Z","hidden":false},{"_id":"6a17a31ab4840f905a674269","name":"Ziao Zhang","hidden":false},{"_id":"6a17a31ab4840f905a67426a","name":"Shiting Huang","hidden":false},{"_id":"6a17a31ab4840f905a67426b","name":"Avery Nie","hidden":false},{"_id":"6a17a31ab4840f905a67426c","name":"Zhen Fang","hidden":false},{"_id":"6a17a31ab4840f905a67426d","name":"Qiuchen Wang","hidden":false},{"_id":"6a17a31ab4840f905a67426e","user":{"_id":"64b02ec0e5000ae8a572ced5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b02ec0e5000ae8a572ced5/6ifLntBU2ICQK7SW8WxKU.png","isPro":false,"fullname":"Lin Chen","user":"Lin-Chen","type":"user","name":"Lin-Chen"},"name":"Lin Chen","status":"claimed_verified","statusLastChangedAt":"2026-05-29T09:35:23.310Z","hidden":false},{"_id":"6a17a31ab4840f905a67426f","name":"Huaian Chen","hidden":false},{"_id":"6a17a31ab4840f905a674270","name":"Zehui Chen","hidden":false},{"_id":"6a17a31ab4840f905a674271","name":"Feng Zhao","hidden":false}],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios","submittedOnDailyBy":{"_id":"68a7d843ac4c72b9877b54b5","avatarUrl":"/avatars/7648ed4dbb4f3a209a8f62af7803b85f.svg","isPro":false,"fullname":"Kou Shi","user":"KouShi2","type":"user","name":"KouShi2"},"summary":"Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dimension of tool use, especially the impact of tool response latency, and are usually limited to single-task settings. In real-world applications, multiple tasks often need to be executed concurrently, and overall efficiency depends on whether an agent can use idle time while waiting for tool responses. We refer to this capability as asynchronous tool calling. To evaluate it, we propose AsyncTool, a benchmark for assessing LLM-based agents in interactive multi-task tool-use environments with delayed tool feedback. AsyncTool presents multiple heterogeneous tasks simultaneously and simulates realistic tool response latency during execution. Using a hybrid data evolution strategy, we construct a diverse asynchronous multitasking dataset that covers multiple scenarios and tool-use patterns. We evaluate models at the step, sub-task, and task levels, and introduce efficiency-oriented metrics to measure task coordination and completion efficiency. Extensive experiments show that delayed tool feedback poses substantial challenges to current agents and leads to clear performance degradation. Models that better coordinate task switching, dependency tracking, and state maintenance achieve stronger performance on AsyncTool. Our analysis identifies key failure modes of current tool-using agents and provides practical insights for designing future systems with stronger temporal reasoning and coordination capabilities.","upvotes":9,"discussionId":"6a17a31bb4840f905a674272","githubRepo":"https://github.com/StoKou/repo-asynctool","githubRepoAddedBy":"user","ai_summary":"LLM-based agents face significant challenges in asynchronous tool calling due to delayed responses, requiring improved task coordination and temporal reasoning capabilities.","ai_keywords":["large language model","tool calling","asynchronous execution","multi-task environments","delayed feedback","task coordination","dependency tracking","state maintenance","temporal reasoning"],"githubStars":7,"organization":{"_id":"67ff908ff0f413c693b7cd0c","name":"ustc-community","fullname":"University of Science and Technology of China","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/660c2d134ba2fcc848b03e21/j_f3uYYIFPH_4WJH9fKel.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"68a7d843ac4c72b9877b54b5","avatarUrl":"/avatars/7648ed4dbb4f3a209a8f62af7803b85f.svg","isPro":false,"fullname":"Kou Shi","user":"KouShi2","type":"user"},{"_id":"665d652e0f35c005de892108","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665d652e0f35c005de892108/OGLbgZekX-3XTBkwS8k86.jpeg","isPro":false,"fullname":"Yu Zeng","user":"YuZeng260","type":"user"},{"_id":"64b0a5037a475fba70a7260d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b0a5037a475fba70a7260d/MauBbb6raMA23yrR1Zq21.jpeg","isPro":false,"fullname":"Zhen Fang","user":"CostaliyA","type":"user"},{"_id":"66ae3fbf491b555fef3bac0c","avatarUrl":"/avatars/47353470d46097ce108d32792dbbf2a2.svg","isPro":false,"fullname":"Shiting Huang","user":"chocckaka","type":"user"},{"_id":"690dbb51ca2a7f3b00f04725","avatarUrl":"/avatars/3eb4d0af08dc046cda8d2d99b560f016.svg","isPro":false,"fullname":"Ziao Zhang","user":"zhang-ziao","type":"user"},{"_id":"665ec9ef60c9027be03b4a18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665ec9ef60c9027be03b4a18/a3iwo6fR2969nL9YyvJM1.jpeg","isPro":false,"fullname":"Yiming Zhao","user":"gaotiexinqu","type":"user"},{"_id":"670a3bc3ada59c956f18cc17","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/670a3bc3ada59c956f18cc17/57oBwS0V9m9SImYHtDb5f.jpeg","isPro":false,"fullname":"SII-sqs","user":"groundhogLLM","type":"user"},{"_id":"64b02ec0e5000ae8a572ced5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b02ec0e5000ae8a572ced5/6ifLntBU2ICQK7SW8WxKU.png","isPro":false,"fullname":"Lin Chen","user":"Lin-Chen","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67ff908ff0f413c693b7cd0c","name":"ustc-community","fullname":"University of Science and Technology of China","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/660c2d134ba2fcc848b03e21/j_f3uYYIFPH_4WJH9fKel.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.27995.md"}">
Papers
arxiv:2605.27995

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Published on May 27
· Submitted by
Kou Shi
on May 29
Authors:
,
,
,
,
,
,
,

Abstract

LLM-based agents face significant challenges in asynchronous tool calling due to delayed responses, requiring improved task coordination and temporal reasoning capabilities.

AI-generated summary

Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dimension of tool use, especially the impact of tool response latency, and are usually limited to single-task settings. In real-world applications, multiple tasks often need to be executed concurrently, and overall efficiency depends on whether an agent can use idle time while waiting for tool responses. We refer to this capability as asynchronous tool calling. To evaluate it, we propose AsyncTool, a benchmark for assessing LLM-based agents in interactive multi-task tool-use environments with delayed tool feedback. AsyncTool presents multiple heterogeneous tasks simultaneously and simulates realistic tool response latency during execution. Using a hybrid data evolution strategy, we construct a diverse asynchronous multitasking dataset that covers multiple scenarios and tool-use patterns. We evaluate models at the step, sub-task, and task levels, and introduce efficiency-oriented metrics to measure task coordination and completion efficiency. Extensive experiments show that delayed tool feedback poses substantial challenges to current agents and leads to clear performance degradation. Models that better coordinate task switching, dependency tracking, and state maintenance achieve stronger performance on AsyncTool. Our analysis identifies key failure modes of current tool-using agents and provides practical insights for designing future systems with stronger temporal reasoning and coordination capabilities.

Community

Paper author Paper submitter 1 day ago

Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dimension of tool use, especially the impact of tool response latency, and are usually limited to single-task settings. In real-world applications, multiple tasks often need to be executed concurrently, and overall efficiency depends on whether an agent can use idle time while waiting for tool responses. We refer to this capability as asynchronous tool calling. To evaluate it, we propose AsyncTool, a benchmark for assessing LLM-based agents in interactive multi-task tool-use environments with delayed tool feedback. AsyncTool presents multiple heterogeneous tasks simultaneously and simulates realistic tool response latency during execution. Using a hybrid data evolution strategy, we construct a diverse asynchronous multitasking dataset that covers multiple scenarios and tool-use patterns. We evaluate models at the step, sub-task, and task levels, and introduce efficiency-oriented metrics to measure task coordination and completion efficiency. Extensive experiments show that delayed tool feedback poses substantial challenges to current agents and leads to clear performance degradation. Models that better coordinate task switching, dependency tracking, and state maintenance achieve stronger performance on AsyncTool. Our analysis identifies key failure modes of current tool-using agents and provides practical insights for designing future systems with stronger temporal reasoning and coordination capabilities.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.27995
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.27995 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.27995 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.27995 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers