Spreadsheet-RL is an RL fine-tuning framework and benchmarking environment designed to improve LLM agent performance on complex, multi-step spreadsheet tasks within Microsoft Excel.</p>\n","updatedAt":"2026-05-22T02:01:02.085Z","author":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","fullname":"taesiri","name":"taesiri","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":303,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8628856539726257},"editors":["taesiri"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg"],"reactions":[{"reaction":"🔥","users":["Mingyuan1997","BiboyQG"],"count":2}],"isReport":false}},{"id":"6a0fba89a641d26c3b2f91e4","author":{"_id":"65d3b7ec8f6b98b34ee6bbe3","avatarUrl":"/avatars/53c2d4e4746147fc2559435d252e8951.svg","fullname":"Mingyuan Wu","name":"Mingyuan1997","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-05-22T02:08:09.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"\n\n\n\n\n","html":"<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/65d3b7ec8f6b98b34ee6bbe3/Kd-QsKT8UZKXLsYn4kA-R.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/65d3b7ec8f6b98b34ee6bbe3/Kd-QsKT8UZKXLsYn4kA-R.png\" alt=\"10bdb50b0e36775dd5ed9da6c2f0c53\"></a></p>\n<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/65d3b7ec8f6b98b34ee6bbe3/9h-d-8o0gOy-BzJ1xfrpX.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/65d3b7ec8f6b98b34ee6bbe3/9h-d-8o0gOy-BzJ1xfrpX.png\" alt=\"image\"></a></p>\n","updatedAt":"2026-05-22T02:08:09.946Z","author":{"_id":"65d3b7ec8f6b98b34ee6bbe3","avatarUrl":"/avatars/53c2d4e4746147fc2559435d252e8951.svg","fullname":"Mingyuan Wu","name":"Mingyuan1997","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3536166250705719},"editors":["Mingyuan1997"],"editorAvatarUrls":["/avatars/53c2d4e4746147fc2559435d252e8951.svg"],"reactions":[{"reaction":"🚀","users":["sta1nlu"],"count":1}],"isReport":false}},{"id":"6a0fd0b0452c81b5b9102eae","author":{"_id":"65d3b7ec8f6b98b34ee6bbe3","avatarUrl":"/avatars/53c2d4e4746147fc2559435d252e8951.svg","fullname":"Mingyuan Wu","name":"Mingyuan1997","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-05-22T03:42:40.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"https://spreadsheet-rl.github.io/","html":"<p><a href=\"https://spreadsheet-rl.github.io/\" rel=\"nofollow\">https://spreadsheet-rl.github.io/</a></p>\n","updatedAt":"2026-05-22T03:42:40.356Z","author":{"_id":"65d3b7ec8f6b98b34ee6bbe3","avatarUrl":"/avatars/53c2d4e4746147fc2559435d252e8951.svg","fullname":"Mingyuan Wu","name":"Mingyuan1997","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.28882917761802673},"editors":["Mingyuan1997"],"editorAvatarUrls":["/avatars/53c2d4e4746147fc2559435d252e8951.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.22642","authors":[{"_id":"6a0fb81aa53a61ce2e422c07","name":"Banghao Chi","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c08","name":"Yining Xie","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c09","name":"Mingyuan Wu","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c0a","name":"Jingcheng Yang","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c0b","name":"Jize Jiang","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c0c","name":"Zhaoheng Li","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c0d","name":"Shengyi Qian","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c0e","name":"Minjia Zhang","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c0f","name":"Klara Nahrstedt","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c10","name":"Rui Hou","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c11","name":"Xiangjun Fan","hidden":false},{"_id":"6a0fb81aa53a61ce2e422c12","name":"Hanchao Yu","hidden":false}],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a promising research direction. Most existing spreadsheet agents rely on specialized prompting over general-purpose LLMs; while this design has potentials on simple spreadsheet operations, it struggles to manage the complex, multi-step workflows typical of real-world applications.\n We introduce Spreadsheet-RL, a reinforcement learning (RL) fine-tuning framework designed to train specialized spreadsheet agents within a realistic Microsoft Excel environment. Spreadsheet-RL features an automated pipeline for scalable collection of paired start-goal spreadsheets from online forums, as well as domain-specific evaluation tasks in areas such as finance and supply chain management, which we compile into the new Domain-Spreadsheet benchmark dataset. It also includes a Spreadsheet Gym environment designed for multi-turn RL: Spreadsheet Gym exposes extensive Excel functionality through a Python sandbox, along with a refined harness that incorporates a comprehensive tool set and carefully designed tool-routing rules for spreadsheet tasks. Through comprehensive experiments, we show that Spreadsheet-RL substantially enhances AI agent's performance on both general and domain-specific spreadsheet tasks: it improves Qwen3-4B-Thinking-2507's Pass@1 on SpreadsheetBench from 12.0% to 23.4%, and raises Pass@1 from 8.4% to 17.2% on our curated Domain-Spreadsheet dataset. These results highlight Spreadsheet-RL's strong potential for generalization and real-world adoption in spreadsheet automation, and broadly, its promise for advancing LLM-based interactions with data interfaces in everyday work.","upvotes":22,"discussionId":"6a0fb81aa53a61ce2e422c13","githubRepo":"https://github.com/Spreadsheet-RL/Spreadsheet-RL","githubRepoAddedBy":"user","ai_summary":"Spreadsheet-RL is a reinforcement learning framework that trains specialized spreadsheet agents in realistic Excel environments, improving AI agent performance on both general and domain-specific spreadsheet tasks through automated data collection and domain-specific benchmarks.","ai_keywords":["reinforcement learning","fine-tuning","spreadsheet agents","Microsoft Excel","automated pipeline","domain-specific evaluation","Domain-Spreadsheet benchmark","Spreadsheet Gym","multi-turn RL","tool-routing rules","Pass@1"],"githubStars":3,"organization":{"_id":"65448bef5b5d9185ba3202b9","name":"UIUC-CS","fullname":"University of Illinois at Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65448b21fcb96b8b48733729/ycqcXFayMTTD_KpE37067.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65614a38b29be3f5b69f296f","avatarUrl":"/avatars/9f000f695714e7bf0252648dbe2cfec7.svg","isPro":false,"fullname":"Jize Jiang","user":"jizej","type":"user"},{"_id":"65d3b7ec8f6b98b34ee6bbe3","avatarUrl":"/avatars/53c2d4e4746147fc2559435d252e8951.svg","isPro":true,"fullname":"Mingyuan Wu","user":"Mingyuan1997","type":"user"},{"_id":"65de7628deee79773f0f46f6","avatarUrl":"/avatars/6c509dbe96e47b47271eb74178c1c9ba.svg","isPro":false,"fullname":"Kai Yan","user":"kaiyan289","type":"user"},{"_id":"680cfb7c0b50df51fe75424b","avatarUrl":"/avatars/febb2dfd80676a8ee093d8c4e738bc44.svg","isPro":false,"fullname":"Yining Xie","user":"eileenxyn","type":"user"},{"_id":"641c5662c3983aa949110c76","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641c5662c3983aa949110c76/VhnS7jffwfyRMfuL-qU_e.jpeg","isPro":false,"fullname":"Banghao Chi","user":"BiboyQG","type":"user"},{"_id":"5ee24411636bdb3834e2da24","avatarUrl":"/avatars/0e12a0b19ada4138ddf808501ba68423.svg","isPro":false,"fullname":"Zhiqing Sun","user":"zhiqings","type":"user"},{"_id":"65a7154ada9f6df141454f20","avatarUrl":"/avatars/1161a72aa28e3915786267bc5f288c07.svg","isPro":false,"fullname":"Haozhen Zheng","user":"zoezheng126","type":"user"},{"_id":"68ea0b47f32cc8ded49e8530","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/JB2gzI4apd6zA5Ix1wWgS.png","isPro":false,"fullname":"Xiong Tianhu","user":"tianhux2","type":"user"},{"_id":"6606434fa1f10a4f761b8d2f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6606434fa1f10a4f761b8d2f/jM2IQN-xFJi-Z2VcUPn93.jpeg","isPro":false,"fullname":"Pengcheng Wang","user":"PengchengW","type":"user"},{"_id":"65b38b6385b6c214481d016e","avatarUrl":"/avatars/2965359c4b2f571f7f1ac4482de615ec.svg","isPro":false,"fullname":"Haojie","user":"yehaojie1","type":"user"},{"_id":"63954c8efb70088a0a3b272c","avatarUrl":"/avatars/749e101da51067b3569c17cf3acad9a8.svg","isPro":false,"fullname":"Jiacheng Zhang","user":"Joyceool","type":"user"},{"_id":"686eb7537730fef2bdc9e3b3","avatarUrl":"/avatars/607501566fdd964bb4199e68865f9c62.svg","isPro":false,"fullname":"Ziheng","user":"Libero0809","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65448bef5b5d9185ba3202b9","name":"UIUC-CS","fullname":"University of Illinois at Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65448b21fcb96b8b48733729/ycqcXFayMTTD_KpE37067.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.22642.md"}">
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning
Authors: ,
,
,
,
,
,
,
,
,
,
,
Abstract
Spreadsheet-RL is a reinforcement learning framework that trains specialized spreadsheet agents in realistic Excel environments, improving AI agent performance on both general and domain-specific spreadsheet tasks through automated data collection and domain-specific benchmarks.
AI-generated summary
Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a promising research direction. Most existing spreadsheet agents rely on specialized prompting over general-purpose LLMs; while this design has potentials on simple spreadsheet operations, it struggles to manage the complex, multi-step workflows typical of real-world applications.
We introduce Spreadsheet-RL, a reinforcement learning (RL) fine-tuning framework designed to train specialized spreadsheet agents within a realistic Microsoft Excel environment. Spreadsheet-RL features an automated pipeline for scalable collection of paired start-goal spreadsheets from online forums, as well as domain-specific evaluation tasks in areas such as finance and supply chain management, which we compile into the new Domain-Spreadsheet benchmark dataset. It also includes a Spreadsheet Gym environment designed for multi-turn RL: Spreadsheet Gym exposes extensive Excel functionality through a Python sandbox, along with a refined harness that incorporates a comprehensive tool set and carefully designed tool-routing rules for spreadsheet tasks. Through comprehensive experiments, we show that Spreadsheet-RL substantially enhances AI agent's performance on both general and domain-specific spreadsheet tasks: it improves Qwen3-4B-Thinking-2507's Pass@1 on SpreadsheetBench from 12.0% to 23.4%, and raises Pass@1 from 8.4% to 17.2% on our curated Domain-Spreadsheet dataset. These results highlight Spreadsheet-RL's strong potential for generalization and real-world adoption in spreadsheet automation, and broadly, its promise for advancing LLM-based interactions with data interfaces in everyday work.
Community
Spreadsheet-RL is an RL fine-tuning framework and benchmarking environment designed to improve LLM agent performance on complex, multi-step spreadsheet tasks within Microsoft Excel.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.22642 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.22642 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.