Hugging Face Daily Papers · · 3 min read

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Struct-Searcher is a training-free agentic workflow that advances multimodal deep research with structure-aware thinking mechanisms.</p>\n","updatedAt":"2026-06-10T02:55:51.159Z","author":{"_id":"6639ad487c0ab4fd9df1dde5","avatarUrl":"/avatars/8cc99f6ed8f8c1b2a14dde797a991a8c.svg","fullname":"Fan Zhang","name":"Karl28","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8796091675758362},"editors":["Karl28"],"editorAvatarUrls":["/avatars/8cc99f6ed8f8c1b2a14dde797a991a8c.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.07689","authors":[{"_id":"6a28d13fe7d78ea7587e544e","name":"Fan Zhang","hidden":false},{"_id":"6a28d13fe7d78ea7587e544f","name":"Vireo Zhang","hidden":false},{"_id":"6a28d13fe7d78ea7587e5450","name":"Shengju Qian","hidden":false},{"_id":"6a28d13fe7d78ea7587e5451","name":"Haoxuan Li","hidden":false},{"_id":"6a28d13fe7d78ea7587e5452","name":"Zheng Lian","hidden":false},{"_id":"6a28d13fe7d78ea7587e5453","name":"Hao Wu","hidden":false},{"_id":"6a28d13fe7d78ea7587e5454","name":"Yuan Gao","hidden":false},{"_id":"6a28d13fe7d78ea7587e5455","name":"Xinyu Geng","hidden":false},{"_id":"6a28d13fe7d78ea7587e5456","name":"Xin Wang","hidden":false},{"_id":"6a28d13fe7d78ea7587e5457","name":"Pheng-Ann Heng","hidden":false}],"publishedAt":"2026-06-05T00:00:00.000Z","submittedOnDailyAt":"2026-06-10T00:00:00.000Z","title":"Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking","submittedOnDailyBy":{"_id":"6639ad487c0ab4fd9df1dde5","avatarUrl":"/avatars/8cc99f6ed8f8c1b2a14dde797a991a8c.svg","isPro":false,"fullname":"Fan Zhang","user":"Karl28","type":"user","name":"Karl28"},"summary":"Deep research agents have attracted increasing attention for their ability to collect large-scale online information to acquire target knowledge, with recent efforts shifting from purely text-based information seeking to multimodal settings. However, existing agentic workflows are largely aligned with evidence accumulation models, which linearly aggregate evidence and lack principled mechanisms for handling contradictory information across heterogeneous modalities. Towards this end, we propose Struct-Searcher, a structural agentic workflow grounded in belief revision theory that explicitly maintains an evolving multimodal structural graph throughout the reasoning process, enabling effective conflict-aware multimodal deep information seeking. Extensive experiments across multiple benchmark datasets and backbone models demonstrate that Struct-Searcher is (1) plug-and-play and model-agnostic, yielding an average relative accuracy improvement of 17.2% on BrowseComp-VL across five different backbones. (2) top-performing, consistently outperforming state-of-the-art vision-language models (VLMs) and deep research agents, with relative accuracy improvements of 3.7% on MM-BrowseComp, 1.5% on HLE-VL, and 0.7% on BrowseComp-VL over the second-best competing approach.","upvotes":5,"discussionId":"6a28d13fe7d78ea7587e5458","ai_summary":"Struct-Searcher introduces a belief revision theory-based structural agentic workflow for multimodal information seeking that improves accuracy over existing vision-language models and deep research agents.","ai_keywords":["belief revision theory","multimodal structural graph","evidence accumulation models","vision-language models","deep research agents","structural agentic workflow","multimodal information seeking"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6390c6fdd00f25601f445cd4","name":"CUHK-CSE","fullname":"The Chinese University of Hong Kong","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/621f2eb36e152b56a7cf0248/o8RRAczRjfNEzq70GzUwQ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6639ad487c0ab4fd9df1dde5","avatarUrl":"/avatars/8cc99f6ed8f8c1b2a14dde797a991a8c.svg","isPro":false,"fullname":"Fan Zhang","user":"Karl28","type":"user"},{"_id":"67136093d2e50f1e8c9fad52","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0q49MyGuav8lJ9CIeyLhu.png","isPro":false,"fullname":"Donghao Zhou","user":"donghao-zhou","type":"user"},{"_id":"64b0a5037a475fba70a7260d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b0a5037a475fba70a7260d/MauBbb6raMA23yrR1Zq21.jpeg","isPro":false,"fullname":"Zhen Fang","user":"CostaliyA","type":"user"},{"_id":"65d8586bd8134b93774bf8e4","avatarUrl":"/avatars/7dc169a96ed0ef73c3d22cc87fbe1b7a.svg","isPro":false,"fullname":"lvfeng","user":"LF02","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6390c6fdd00f25601f445cd4","name":"CUHK-CSE","fullname":"The Chinese University of Hong Kong","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/621f2eb36e152b56a7cf0248/o8RRAczRjfNEzq70GzUwQ.png"}}">
Papers
arxiv:2606.07689

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

Published on Jun 5
· Submitted by
Fan Zhang
on Jun 10
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Struct-Searcher introduces a belief revision theory-based structural agentic workflow for multimodal information seeking that improves accuracy over existing vision-language models and deep research agents.

Deep research agents have attracted increasing attention for their ability to collect large-scale online information to acquire target knowledge, with recent efforts shifting from purely text-based information seeking to multimodal settings. However, existing agentic workflows are largely aligned with evidence accumulation models, which linearly aggregate evidence and lack principled mechanisms for handling contradictory information across heterogeneous modalities. Towards this end, we propose Struct-Searcher, a structural agentic workflow grounded in belief revision theory that explicitly maintains an evolving multimodal structural graph throughout the reasoning process, enabling effective conflict-aware multimodal deep information seeking. Extensive experiments across multiple benchmark datasets and backbone models demonstrate that Struct-Searcher is (1) plug-and-play and model-agnostic, yielding an average relative accuracy improvement of 17.2% on BrowseComp-VL across five different backbones. (2) top-performing, consistently outperforming state-of-the-art vision-language models (VLMs) and deep research agents, with relative accuracy improvements of 3.7% on MM-BrowseComp, 1.5% on HLE-VL, and 0.7% on BrowseComp-VL over the second-best competing approach.

Community

Paper submitter about 14 hours ago

Struct-Searcher is a training-free agentic workflow that advances multimodal deep research with structure-aware thinking mechanisms.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.07689 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.07689 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.07689 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers