Hugging Face Daily Papers · June 2, 2026 · 4 min read

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

#model-release #multimodal #agents #benchmark

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

MineExplorer is a benchmark for evaluating the open-world exploration capabilities of MLLLM agents in Minecraft. It uses a multi-agent synthesis workflow that jointly designs task graphs, sandbox scenes, and rule-based milestone evaluators. Experiments show that open-world exploration remains challenging: strong models handle many single-hop tasks but degrade sharply when hidden prerequisites must be coordinated over longer trajectories.</p>\n","updatedAt":"2026-06-02T12:01:51.236Z","author":{"_id":"6816d98fc075e49c1b15928e","avatarUrl":"/avatars/6b24d047fc25075bedb3e74f78981bc0.svg","fullname":"Tianjie Ju","name":"jometeorieNUS","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.881409764289856},"editors":["jometeorieNUS"],"editorAvatarUrls":["/avatars/6b24d047fc25075bedb3e74f78981bc0.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.30931","authors":[{"_id":"6a1ec368808ddbc3c7d4403c","name":"Tianjie Ju","hidden":false},{"_id":"6a1ec368808ddbc3c7d4403d","name":"Yueqing Sun","hidden":false},{"_id":"6a1ec368808ddbc3c7d4403e","name":"Zheng Wu","hidden":false},{"_id":"6a1ec368808ddbc3c7d4403f","name":"Wei Zhang","hidden":false},{"_id":"6a1ec368808ddbc3c7d44040","name":"Yaqi Huo","hidden":false},{"_id":"6a1ec368808ddbc3c7d44041","name":"Xi Su","hidden":false},{"_id":"6a1ec368808ddbc3c7d44042","name":"Qi Gu","hidden":false},{"_id":"6a1ec368808ddbc3c7d44043","name":"Xunliang Cai","hidden":false},{"_id":"6a1ec368808ddbc3c7d44044","name":"Gongshen Liu","hidden":false},{"_id":"6a1ec368808ddbc3c7d44045","name":"Zhuosheng Zhang","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft","submittedOnDailyBy":{"_id":"6816d98fc075e49c1b15928e","avatarUrl":"/avatars/6b24d047fc25075bedb3e74f78981bc0.svg","isPro":false,"fullname":"Tianjie Ju","user":"jometeorieNUS","type":"user","name":"jometeorieNUS"},"summary":"Multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and action generation. However, their ability to sustain exploration in dynamic open worlds remains unclear. Existing embodied and game-based benchmarks often compress interaction into short-horizon tasks or entangle success with domain-specific game mechanics. In this paper, we introduce MineExplorer benchmark for evaluating open-world exploration capabilities of MLLM agents in Minecraft. We first filter atomic tasks whose solutions rely heavily on Minecraft-specific knowledge to better reflect general open-world reasoning. Then we organize the benchmark around a ReAct-style capability formulation and compose atomic tasks into implicit multi-hop tasks. To further construct reliable instances, MineExplorer uses a multi-agent synthesis workflow that jointly designs task graphs, sandbox scenes, and rule-based milestone evaluators. Human evaluation shows that the multi-agent synthesis workflow produces significantly more reliable instances than a single-agent baseline. Experiments with advanced MLLM agents show that open-world exploration remains challenging, as strong models can handle many single-hop tasks but degrade sharply when hidden prerequisites must be coordinated over longer trajectories. Further analysis finds that task difficulty tracks agent completion, and larger models or thinking modes do not consistently translate into better performance. Code and dataset are available at https://github.com/Jometeorie/MineExplorer.","upvotes":6,"discussionId":"6a1ec368808ddbc3c7d44046","githubRepo":"https://github.com/Jometeorie/MineExplorer","githubRepoAddedBy":"user","ai_summary":"MineExplorer benchmark evaluates multimodal large language models' open-world exploration capabilities in Minecraft through atomic and multi-hop tasks designed via multi-agent synthesis.","ai_keywords":["Multimodal large language models","open-world exploration","Minecraft","ReAct-style capability formulation","atomic tasks","multi-hop tasks","multi-agent synthesis","task graphs","sandbox scenes","milestone evaluators"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":5},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6816d98fc075e49c1b15928e","avatarUrl":"/avatars/6b24d047fc25075bedb3e74f78981bc0.svg","isPro":false,"fullname":"Tianjie Ju","user":"jometeorieNUS","type":"user"},{"_id":"619ddd708ae9cafd72ab20d5","avatarUrl":"/avatars/6b44e4928de0fc27287bf922c3f1802d.svg","isPro":false,"fullname":"Chengcheng Han","user":"hccngu","type":"user"},{"_id":"643910dbabdc6ce5351e4eb5","avatarUrl":"/avatars/92ec189cd4325b4d85fdfcd59f1ff1e3.svg","isPro":false,"fullname":"Yueqing Sun","user":"leqing","type":"user"},{"_id":"6a15dabccfff5937535b56f1","avatarUrl":"/avatars/c673889a37f80cc19bf6bef0f60b2172.svg","isPro":false,"fullname":"Mateo Smith","user":"msmith25","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"66e02a3244a0a0d03316fea3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66e02a3244a0a0d03316fea3/jOtUgjO--4cetJdpvkWgm.jpeg","isPro":false,"fullname":"Yanlin Li","user":"yanlinli","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.30931.md"}">

Papers

arxiv:2605.30931

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Published on May 29

· Submitted by

Tianjie Ju on Jun 2

Upvote

Authors:

Abstract

MineExplorer benchmark evaluates multimodal large language models' open-world exploration capabilities in Minecraft through atomic and multi-hop tasks designed via multi-agent synthesis.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and action generation. However, their ability to sustain exploration in dynamic open worlds remains unclear. Existing embodied and game-based benchmarks often compress interaction into short-horizon tasks or entangle success with domain-specific game mechanics. In this paper, we introduce MineExplorer benchmark for evaluating open-world exploration capabilities of MLLM agents in Minecraft. We first filter atomic tasks whose solutions rely heavily on Minecraft-specific knowledge to better reflect general open-world reasoning. Then we organize the benchmark around a ReAct-style capability formulation and compose atomic tasks into implicit multi-hop tasks. To further construct reliable instances, MineExplorer uses a multi-agent synthesis workflow that jointly designs task graphs, sandbox scenes, and rule-based milestone evaluators. Human evaluation shows that the multi-agent synthesis workflow produces significantly more reliable instances than a single-agent baseline. Experiments with advanced MLLM agents show that open-world exploration remains challenging, as strong models can handle many single-hop tasks but degrade sharply when hidden prerequisites must be coordinated over longer trajectories. Further analysis finds that task difficulty tracks agent completion, and larger models or thinking modes do not consistently translate into better performance. Code and dataset are available at https://github.com/Jometeorie/MineExplorer.

View arXiv page View PDF GitHub 5 Add to collection

Community

jometeorieNUS

Paper submitter about 14 hours ago

•

edited about 14 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.30931

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.30931 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.30931 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.30931 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers