Hugging Face Daily Papers · · 5 min read

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Industrial asset operations workflows are latency-sensitive because a single user<br>query may require coordination over sensor data, work orders, failure modes, fore-<br>casting tools, and domain-specific agents. We evaluate this problem on Asse-<br>tOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline<br>exposes repeated overhead from tool discovery, LLM planning, MCP tool exe-<br>cution, and final summarization. Existing LLM caching techniques such as KV-<br>cache reuse and embedding-based semantic caching were designed for chatbot<br>serving and break down when output validity depends on time, asset, or sen-<br>sor parameters. We propose two complementary optimization layers for AOB<br>plan-execute pipelines: a temporal semantic cache and a set of MCP workflow<br>optimizations combining disk-backed tool-discovery caching and dependency-<br>aware parallel step execution. MCP workflow optimizations corresponded to a<br>1.67× speedup and reduced median end-to-end latency by about 40.0% while the<br>temporal-cache benchmark achieved a median of 30.6× speedup on cache hits.<br>Beyond the speedup, our results expose a concrete failure mode of pure seman-<br>tic caching for parameter-rich industrial queries, providing a critical analysis of<br>how caching choices interact with evaluation correctness in MCP-backed agent<br>benchmarks.</p>\n","updatedAt":"2026-05-21T02:11:03.702Z","author":{"_id":"64c47f731d44fc06afc80953","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UT2mHX2WuCm5Ws4rGKyCB.png","fullname":"Dhaval Patel","name":"DhavalPatel","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8504114747047424},"editors":["DhavalPatel"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UT2mHX2WuCm5Ws4rGKyCB.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.20630","authors":[{"_id":"6a0e6996164dbbc68a26c46a","name":"Alimurtaza Mustafa Merchant","hidden":false},{"_id":"6a0e6996164dbbc68a26c46b","name":"Krish Veera","hidden":false},{"_id":"6a0e6996164dbbc68a26c46c","name":"Sajal Kumar Goyla","hidden":false},{"_id":"6a0e6996164dbbc68a26c46d","name":"Shambhawi Bhure","hidden":false},{"_id":"6a0e6996164dbbc68a26c46e","name":"Dhaval Patel","hidden":false},{"_id":"6a0e6996164dbbc68a26c46f","name":"Kaoutar El Maghraoui","hidden":false}],"publishedAt":"2026-05-20T00:00:00.000Z","submittedOnDailyAt":"2026-05-21T00:00:00.000Z","title":"Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines","submittedOnDailyBy":{"_id":"64c47f731d44fc06afc80953","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UT2mHX2WuCm5Ws4rGKyCB.png","isPro":false,"fullname":"Dhaval Patel","user":"DhavalPatel","type":"user","name":"DhavalPatel"},"summary":"Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.","upvotes":9,"discussionId":"6a0e6996164dbbc68a26c470","ai_summary":"Industrial asset operations workflows face latency challenges due to complex coordination needs, addressed through novel caching and workflow optimization techniques that improve execution speed while maintaining correctness in parameter-rich environments.","ai_keywords":["temporal semantic cache","MCP workflow optimizations","disk-backed tool-discovery caching","dependency-aware parallel step execution","plan-execute pipeline","AssetOpsBench","LLM caching","KV-cache reuse","embedding-based semantic caching"],"organization":{"_id":"616e7b1d75754a5d5fa455cf","name":"ibm","fullname":"IBM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/637bfdf60dc13843b468ac20/9228luWRoGbZwKGxkOOsj.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64c47f731d44fc06afc80953","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UT2mHX2WuCm5Ws4rGKyCB.png","isPro":false,"fullname":"Dhaval Patel","user":"DhavalPatel","type":"user"},{"_id":"6915ff8130500b43788ae3ac","avatarUrl":"/avatars/4b94cbfb9785a5cc4f162d7af928e4c9.svg","isPro":false,"fullname":"Krish Veera","user":"krishveera14","type":"user"},{"_id":"68bdcec783b40d71e14a07df","avatarUrl":"/avatars/2d57dfd49e304646d80b81a915afb00d.svg","isPro":false,"fullname":"Alimurtaza Merchant","user":"alimurtaza0411","type":"user"},{"_id":"69b065faba86efefd91a7ce1","avatarUrl":"/avatars/e2e4eb75eb09956250f3b2a8fc07392d.svg","isPro":false,"fullname":"Sajal Kumar Goyla","user":"SajalGoyla","type":"user"},{"_id":"6917be5aab32be016a17c811","avatarUrl":"/avatars/4fe93710947a463ec77002869aa82ed1.svg","isPro":false,"fullname":"Madhav Tibrewal","user":"madhavtibrewal92","type":"user"},{"_id":"69b06604531cfd1c221ad8db","avatarUrl":"/avatars/cd34a3567ab20157e32682978cf59dc9.svg","isPro":false,"fullname":"Shambhawi Bhure","user":"shambhawibhure","type":"user"},{"_id":"662e745901e4fa6f0104a964","avatarUrl":"/avatars/c2f8f1d51e4a8ef2c5bd6c9d97b33cfc.svg","isPro":false,"fullname":"Deipey Paanchal","user":"deipeypaanchal","type":"user"},{"_id":"66bc79f085f17cea2c317811","avatarUrl":"/avatars/695b12231f8460c8f7b14382c9e3a995.svg","isPro":false,"fullname":"FayedHakim","user":"LLMProj","type":"user"},{"_id":"69a3f35b34ec7f83af7d67ad","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/EdXIApkxUX9flqqrBQ3_d.png","isPro":false,"fullname":"郭思宇","user":"songwe1xj","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"616e7b1d75754a5d5fa455cf","name":"ibm","fullname":"IBM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/637bfdf60dc13843b468ac20/9228luWRoGbZwKGxkOOsj.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.20630.md"}">
Papers
arxiv:2605.20630

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Published on May 20
· Submitted by
Dhaval Patel
on May 21
Authors:
,
,
,
,
,

Abstract

Industrial asset operations workflows face latency challenges due to complex coordination needs, addressed through novel caching and workflow optimization techniques that improve execution speed while maintaining correctness in parameter-rich environments.

AI-generated summary

Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.

Community

Industrial asset operations workflows are latency-sensitive because a single user
query may require coordination over sensor data, work orders, failure modes, fore-
casting tools, and domain-specific agents. We evaluate this problem on Asse-
tOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline
exposes repeated overhead from tool discovery, LLM planning, MCP tool exe-
cution, and final summarization. Existing LLM caching techniques such as KV-
cache reuse and embedding-based semantic caching were designed for chatbot
serving and break down when output validity depends on time, asset, or sen-
sor parameters. We propose two complementary optimization layers for AOB
plan-execute pipelines: a temporal semantic cache and a set of MCP workflow
optimizations combining disk-backed tool-discovery caching and dependency-
aware parallel step execution. MCP workflow optimizations corresponded to a
1.67× speedup and reduced median end-to-end latency by about 40.0% while the
temporal-cache benchmark achieved a median of 30.6× speedup on cache hits.
Beyond the speedup, our results expose a concrete failure mode of pure seman-
tic caching for parameter-rich industrial queries, providing a critical analysis of
how caching choices interact with evaluation correctness in MCP-backed agent
benchmarks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.20630
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.20630 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.20630 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.20630 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers