Hugging Face Daily Papers · May 21, 2026 · 5 min read

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, fore- casting tools, and domain-specific agents. We evaluate this problem on Asse- tOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool exe- cution, and final summarization. Existing LLM caching techniques such as KV- cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sen- sor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency- aware parallel step execution. MCP workflow optimizations corresponded to a 1.67× speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6× speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure seman- tic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.\n","updatedAt":"2026-05-21T02:11:03.702Z","author":{"_id":"64c47f731d44fc06afc80953","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UT2mHX2WuCm5Ws4rGKyCB.png","fullname":"Dhaval Patel","name":"DhavalPatel","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8504114747047424},"editors":["DhavalPatel"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UT2mHX2WuCm5Ws4rGKyCB.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.20630","authors":[{"_id":"6a0e6996164dbbc68a26c46a","name":"Alimurtaza Mustafa Merchant","hidden":false},{"_id":"6a0e6996164dbbc68a26c46b","name":"Krish Veera","hidden":false},{"_id":"6a0e6996164dbbc68a26c46c","name":"Sajal Kumar Goyla","hidden":false},{"_id":"6a0e6996164dbbc68a26c46d","name":"Shambhawi Bhure","hidden":false},{"_id":"6a0e6996164dbbc68a26c46e","name":"Dhaval Patel","hidden":false},{"_id":"6a0e6996164dbbc68a26c46f","name":"Kaoutar El Maghraoui","hidden":false}],"publishedAt":"2026-05-20T00:00:00.000Z","submittedOnDailyAt":"2026-05-21T00:00:00.000Z","title":"Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines","submittedOnDailyBy":{"_id":"64c47f731d44fc06afc80953","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UT2mHX2WuCm5Ws4rGKyCB.png","isPro":false,"fullname":"Dhaval Patel","user":"DhavalPatel","type":"user","name":"DhavalPatel"},"summary":"Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.","upvotes":9,"discussionId":"6a0e6996164dbbc68a26c470","ai_summary":"Industrial asset operations workflows face latency challenges due to complex coordination needs, addressed through novel caching and workflow optimization techniques that improve execution speed while maintaining correctness in parameter-rich environments.","ai_keywords":["temporal semantic cache","MCP workflow optimizations","disk-backed tool-discovery caching","dependency-aware parallel step execution","plan-execute pipeline","AssetOpsBench","LLM caching","KV-cache reuse","embedding-based semantic caching"],"organization":{"_id":"616e7b1d75754a5d5fa455cf","name":"ibm","fullname":"IBM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/637bfdf60dc13843b468ac20/9228luWRoGbZwKGxkOOsj.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64c47f731d44fc06afc80953","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UT2mHX2WuCm5Ws4rGKyCB.png","isPro":false,"fullname":"Dhaval Patel","user":"DhavalPatel","type":"user"},{"_id":"6915ff8130500b43788ae3ac","avatarUrl":"/avatars/4b94cbfb9785a5cc4f162d7af928e4c9.svg","isPro":false,"fullname":"Krish Veera","user":"krishveera14","type":"user"},{"_id":"68bdcec783b40d71e14a07df","avatarUrl":"/avatars/2d57dfd49e304646d80b81a915afb00d.svg","isPro":false,"fullname":"Alimurtaza Merchant","user":"alimurtaza0411","type":"user"},{"_id":"69b065faba86efefd91a7ce1","avatarUrl":"/avatars/e2e4eb75eb09956250f3b2a8fc07392d.svg","isPro":false,"fullname":"Sajal Kumar Goyla","user":"SajalGoyla","type":"user"},{"_id":"6917be5aab32be016a17c811","avatarUrl":"/avatars/4fe93710947a463ec77002869aa82ed1.svg","isPro":false,"fullname":"Madhav Tibrewal","user":"madhavtibrewal92","type":"user"},{"_id":"69b06604531cfd1c221ad8db","avatarUrl":"/avatars/cd34a3567ab20157e32682978cf59dc9.svg","isPro":false,"fullname":"Shambhawi Bhure","user":"shambhawibhure","type":"user"},{"_id":"662e745901e4fa6f0104a964","avatarUrl":"/avatars/c2f8f1d51e4a8ef2c5bd6c9d97b33cfc.svg","isPro":false,"fullname":"Deipey Paanchal","user":"deipeypaanchal","type":"user"},{"_id":"66bc79f085f17cea2c317811","avatarUrl":"/avatars/695b12231f8460c8f7b14382c9e3a995.svg","isPro":false,"fullname":"FayedHakim","user":"LLMProj","type":"user"},{"_id":"69a3f35b34ec7f83af7d67ad","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/EdXIApkxUX9flqqrBQ3_d.png","isPro":false,"fullname":"郭思宇","user":"songwe1xj","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"616e7b1d75754a5d5fa455cf","name":"ibm","fullname":"IBM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/637bfdf60dc13843b468ac20/9228luWRoGbZwKGxkOOsj.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.20630.md"}">

Papers

arxiv:2605.20630

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Published on May 20

· Submitted by

Dhaval Patel on May 21

IBM

Upvote

Authors:

Abstract

Industrial asset operations workflows face latency challenges due to complex coordination needs, addressed through novel caching and workflow optimization techniques that improve execution speed while maintaining correctness in parameter-rich environments.

AI-generated summary

Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.

View arXiv page View PDF Add to collection

Community

DhavalPatel

Paper submitter about 11 hours ago

Industrial asset operations workflows are latency-sensitive because a single user
query may require coordination over sensor data, work orders, failure modes, fore-
casting tools, and domain-specific agents. We evaluate this problem on Asse-
tOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline
exposes repeated overhead from tool discovery, LLM planning, MCP tool exe-
cution, and final summarization. Existing LLM caching techniques such as KV-
cache reuse and embedding-based semantic caching were designed for chatbot
serving and break down when output validity depends on time, asset, or sen-
sor parameters. We propose two complementary optimization layers for AOB
plan-execute pipelines: a temporal semantic cache and a set of MCP workflow
optimizations combining disk-backed tool-discovery caching and dependency-
aware parallel step execution. MCP workflow optimizations corresponded to a
1.67× speedup and reduced median end-to-end latency by about 40.0% while the
temporal-cache benchmark achieved a median of 30.6× speedup on cache hits.
Beyond the speedup, our results expose a concrete failure mode of pure seman-
tic caching for parameter-rich industrial queries, providing a critical analysis of
how caching choices interact with evaluation correctness in MCP-backed agent
benchmarks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.20630

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.20630 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.20630 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.20630 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers