Hugging Face Daily Papers · · 6 min read

Agents' Last Exam

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Agents' Last Exam (ALE): can AI agents genuinely do the work of human experts in real-world settings?</p>\n<p>A living benchmark built with 300+ experts across 55 industries, yielding 1,500+ real-world tasks. Three things set it apart:</p>\n<ol>\n<li>Real origins: every task comes from actual projects experts completed on the job, mapped to the U.S. occupational taxonomy (O*NET).</li>\n<li>Unconstrained: generalist computer-use agents get full GUI + CLI and solve tasks however they want, judged on results, not method.</li>\n<li>Objective: scored by reproducible, deterministic code evaluators, with no human judge.</li>\n</ol>\n<p>Frontier agents pass only 2.6% on the hardest \"last-exam\" tier, a sobering reality check on the timeline for AI workplace automation. We call it the \"Last Exam\" as saturating it means agents can genuinely power real industries.</p>\n","updatedAt":"2026-06-09T16:52:52.157Z","author":{"_id":"68e820313ebb7d1516e1face","avatarUrl":"/avatars/25e6884825b4dee395b235f4b1b5c764.svg","fullname":"Han","name":"XinyangDavidHan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8942440152168274},"editors":["XinyangDavidHan"],"editorAvatarUrls":["/avatars/25e6884825b4dee395b235f4b1b5c764.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.05405","authors":[{"_id":"6a248a27e4c258a029491b6b","user":{"_id":"6582486355a1e6cdb3b5e19d","avatarUrl":"/avatars/80d8319c98b641dbc7bcd5a530b986b7.svg","isPro":false,"fullname":"Yiyou Sun","user":"sunyiyou","type":"user","name":"sunyiyou"},"name":"Yiyou Sun","status":"claimed_verified","statusLastChangedAt":"2026-06-08T09:45:13.118Z","hidden":false},{"_id":"6a248a27e4c258a029491b6c","user":{"_id":"68e820313ebb7d1516e1face","avatarUrl":"/avatars/25e6884825b4dee395b235f4b1b5c764.svg","isPro":false,"fullname":"Han","user":"XinyangDavidHan","type":"user","name":"XinyangDavidHan"},"name":"Xinyang Han","status":"claimed_verified","statusLastChangedAt":"2026-06-08T09:45:17.781Z","hidden":false},{"_id":"6a248a27e4c258a029491b6d","name":"Weichen Zhang","hidden":false},{"_id":"6a248a27e4c258a029491b6e","name":"Yuanbo Pang","hidden":false},{"_id":"6a248a27e4c258a029491b6f","name":"Tianyu Wang","hidden":false},{"_id":"6a248a27e4c258a029491b70","name":"Yuhan Cao","hidden":false},{"_id":"6a248a27e4c258a029491b71","name":"Yixiao Huang","hidden":false},{"_id":"6a248a27e4c258a029491b72","name":"Chris Duroiu","hidden":false},{"_id":"6a248a27e4c258a029491b73","name":"Haoyun Zhang","hidden":false},{"_id":"6a248a27e4c258a029491b74","name":"Jeffrey Lin","hidden":false},{"_id":"6a248a27e4c258a029491b75","name":"Weishu Zhang","hidden":false},{"_id":"6a248a27e4c258a029491b76","name":"Tyler Zeng","hidden":false},{"_id":"6a248a27e4c258a029491b77","name":"Ying Yan","hidden":false},{"_id":"6a248a27e4c258a029491b78","user":{"_id":"635e3a76106f984574c36409","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1667120725800-635e3a76106f984574c36409.png","isPro":false,"fullname":"Bo Liu","user":"Benjamin-eecs","type":"user","name":"Benjamin-eecs"},"name":"Bo Liu","status":"claimed_verified","statusLastChangedAt":"2026-06-08T09:45:15.286Z","hidden":true},{"_id":"6a248a27e4c258a029491b79","name":"Hanson Wen","hidden":false},{"_id":"6a248a27e4c258a029491b7a","name":"Mingyang Xu","hidden":false},{"_id":"6a248a27e4c258a029491b7b","name":"Xiaoyuan Liu","hidden":false},{"_id":"6a248a27e4c258a029491b7c","name":"Zimeng Chen","hidden":false},{"_id":"6a248a27e4c258a029491b7d","name":"Weiyan Shi","hidden":false},{"_id":"6a248a27e4c258a029491b7e","name":"Amanda Dsouza","hidden":false},{"_id":"6a248a27e4c258a029491b7f","name":"Vincent Sunn Chen","hidden":false},{"_id":"6a248a27e4c258a029491b80","name":"Patrick Bryant","hidden":false},{"_id":"6a248a27e4c258a029491b81","name":"Carl Boettiger","hidden":false},{"_id":"6a248a27e4c258a029491b82","name":"Yamini Rangan","hidden":false},{"_id":"6a248a27e4c258a029491b83","name":"Bradley Rothenberg","hidden":false},{"_id":"6a248a27e4c258a029491b84","name":"Kyle Steinfeld","hidden":false},{"_id":"6a248a27e4c258a029491b85","name":"Arvind Rao","hidden":false},{"_id":"6a248a27e4c258a029491b86","name":"Tapio Schneider","hidden":false},{"_id":"6a248a27e4c258a029491b87","name":"Georgios Yannakakis","hidden":false},{"_id":"6a248a27e4c258a029491b88","name":"Laure Zanna","hidden":false},{"_id":"6a248a27e4c258a029491b89","name":"Kaan Ozbay","hidden":false},{"_id":"6a248a27e4c258a029491b8a","name":"Ida Sim","hidden":false},{"_id":"6a248a27e4c258a029491b8b","name":"Tarek Zohdi","hidden":false},{"_id":"6a248a27e4c258a029491b8c","name":"George Em Karniadakis","hidden":false},{"_id":"6a248a27e4c258a029491b8d","name":"Jack Gallant","hidden":false},{"_id":"6a248a27e4c258a029491b8e","name":"Teresa Head-gordon","hidden":false},{"_id":"6a248a27e4c258a029491b8f","name":"Yushan Li","hidden":false},{"_id":"6a248a27e4c258a029491b90","name":"Wenxi Deng","hidden":false},{"_id":"6a248a27e4c258a029491b91","name":"Tao Sun","hidden":false},{"_id":"6a248a27e4c258a029491b92","name":"Huiqi Wang","hidden":false},{"_id":"6a248a27e4c258a029491b93","name":"Zhun Wang","hidden":false},{"_id":"6a248a27e4c258a029491b94","name":"Justin Xu","hidden":false},{"_id":"6a248a27e4c258a029491b95","name":"Chris Yuhao Liu","hidden":false},{"_id":"6a248a27e4c258a029491b96","name":"Yafei Cheng","hidden":false},{"_id":"6a248a27e4c258a029491b97","name":"Rongwang Hu","hidden":false},{"_id":"6a248a27e4c258a029491b98","name":"Aras Bacho","hidden":false},{"_id":"6a248a27e4c258a029491b99","name":"Shengcao Cao","hidden":false},{"_id":"6a248a27e4c258a029491b9a","name":"Zengyi Qin","hidden":false},{"_id":"6a248a27e4c258a029491b9b","name":"Yixiong Chen","hidden":false},{"_id":"6a248a27e4c258a029491b9c","name":"Hengduan Fan","hidden":false},{"_id":"6a248a27e4c258a029491b9d","name":"Hao Liu","hidden":false},{"_id":"6a248a27e4c258a029491b9e","name":"Lin Zeng","hidden":false},{"_id":"6a248a27e4c258a029491b9f","name":"Shashank Muralidhar Bharadwaj","hidden":false},{"_id":"6a248a27e4c258a029491ba0","name":"Litian Gong","hidden":false},{"_id":"6a248a27e4c258a029491ba1","name":"Yingxuan Yang","hidden":false},{"_id":"6a248a27e4c258a029491ba2","name":"Maojia Song","hidden":false},{"_id":"6a248a27e4c258a029491ba3","name":"Ruheng Wang","hidden":false},{"_id":"6a248a27e4c258a029491ba4","name":"Zongzheng Zhang","hidden":false},{"_id":"6a248a27e4c258a029491ba5","name":"Honglin Bao","hidden":false},{"_id":"6a248a27e4c258a029491ba6","user":{"_id":"65bc543cd30db1c18dff85cf","avatarUrl":"/avatars/de4396687f90d0957ba0a2c68ae80a83.svg","isPro":false,"fullname":"shuolu","user":"shuolucs","type":"user","name":"shuolucs"},"name":"Shuo Lu","status":"claimed_verified","statusLastChangedAt":"2026-06-09T12:46:36.208Z","hidden":false},{"_id":"6a248a27e4c258a029491ba7","name":"Jianhong Tu","hidden":false},{"_id":"6a248a27e4c258a029491ba8","name":"Zhonghua Wang","hidden":false},{"_id":"6a248a27e4c258a029491ba9","name":"Zheng Zhang","hidden":false},{"_id":"6a248a27e4c258a029491baa","name":"Zijiao Chen","hidden":false},{"_id":"6a248a27e4c258a029491bab","name":"yanqiong Jiang","hidden":false},{"_id":"6a248a27e4c258a029491bac","name":"Zhendong Li","hidden":false},{"_id":"6a248a27e4c258a029491bad","name":"Bohan Lyu","hidden":false},{"_id":"6a248a27e4c258a029491bae","name":"Chang Ma","hidden":false},{"_id":"6a248a27e4c258a029491baf","name":"Peiran Xu","hidden":false},{"_id":"6a248a27e4c258a029491bb0","name":"Benran Zhang","hidden":false},{"_id":"6a248a27e4c258a029491bb1","name":"Shangding Gu","hidden":false},{"_id":"6a248a27e4c258a029491bb2","name":"Haoyue Hua","hidden":false},{"_id":"6a248a27e4c258a029491bb3","name":"Haoyang Li","hidden":false},{"_id":"6a248a27e4c258a029491bb4","name":"Wanzhe Liao","hidden":false},{"_id":"6a248a27e4c258a029491bb5","name":"Chengzhi Liu","hidden":false},{"_id":"6a248a27e4c258a029491bb6","name":"Junbo Peng","hidden":false},{"_id":"6a248a27e4c258a029491bb7","name":"Haoran Sun","hidden":false},{"_id":"6a248a27e4c258a029491bb8","name":"Zechen Xu","hidden":false},{"_id":"6a248a27e4c258a029491bb9","name":"Bo Chen","hidden":false},{"_id":"6a248a27e4c258a029491bba","name":"Jiayi Cheng","hidden":false},{"_id":"6a248a27e4c258a029491bbb","name":"Yi Jiang","hidden":false},{"_id":"6a248a27e4c258a029491bbc","name":"Keying Kuang","hidden":false},{"_id":"6a248a27e4c258a029491bbd","name":"Yuan Li","hidden":false},{"_id":"6a248a27e4c258a029491bbe","name":"Youbang Pan","hidden":false},{"_id":"6a248a27e4c258a029491bbf","name":"Ziyan Rao","hidden":false},{"_id":"6a248a27e4c258a029491bc0","name":"Alexander Schubert","hidden":false},{"_id":"6a248a27e4c258a029491bc1","name":"Yifan Shen","hidden":false},{"_id":"6a248a27e4c258a029491bc2","name":"Vincent Siu","hidden":false},{"_id":"6a248a27e4c258a029491bc3","name":"Xiatao Sun","hidden":false},{"_id":"6a248a27e4c258a029491bc4","name":"Kangqi Zhang","hidden":false},{"_id":"6a248a27e4c258a029491bc5","name":"Xiaopan Zhang","hidden":false},{"_id":"6a248a27e4c258a029491bc6","name":"Yuchen Zhu","hidden":false},{"_id":"6a248a27e4c258a029491bc7","name":"Ishaan Singh Chandok","hidden":false},{"_id":"6a248a27e4c258a029491bc8","name":"Lei Ding","hidden":false},{"_id":"6a248a27e4c258a029491bc9","name":"Jingxuan Fan","hidden":false},{"_id":"6a248a27e4c258a029491bca","name":"Andrew Glover","hidden":false},{"_id":"6a248a27e4c258a029491bcb","name":"Jiaming Hu","hidden":false},{"_id":"6a248a27e4c258a029491bcc","name":"Yiran Hu","hidden":false},{"_id":"6a248a27e4c258a029491bcd","name":"Wenbo Huang","hidden":false},{"_id":"6a248a27e4c258a029491bce","name":"Zixin Jiang","hidden":false},{"_id":"6a248a27e4c258a029491bcf","name":"Haoran Jin","hidden":false},{"_id":"6a248a27e4c258a029491bd0","name":"Lukas Kim","hidden":false},{"_id":"6a248a27e4c258a029491bd1","name":"Ming Liu","hidden":false},{"_id":"6a248a27e4c258a029491bd2","name":"Yang Liu","hidden":false},{"_id":"6a248a27e4c258a029491bd3","name":"Alireza Rafiei","hidden":false},{"_id":"6a248a27e4c258a029491bd4","name":"Xuhuan Shen","hidden":false},{"_id":"6a248a27e4c258a029491bd5","name":"Kunyang Sun","hidden":false},{"_id":"6a248a27e4c258a029491bd6","name":"Sophia Sun","hidden":false},{"_id":"6a248a27e4c258a029491bd7","name":"Ting Sun","hidden":false},{"_id":"6a248a27e4c258a029491bd8","name":"Eric Wang","hidden":false},{"_id":"6a248a27e4c258a029491bd9","name":"Yixin Wang","hidden":false},{"_id":"6a248a27e4c258a029491bda","name":"Hanwen Xing","hidden":false},{"_id":"6a248a27e4c258a029491bdb","name":"Sihan Xu","hidden":false},{"_id":"6a248a27e4c258a029491bdc","name":"Yuzheng Xu","hidden":false},{"_id":"6a248a27e4c258a029491bdd","name":"Zhongxing Xu","hidden":false},{"_id":"6a248a27e4c258a029491bde","name":"Zhiling Yan","hidden":false},{"_id":"6a248a27e4c258a029491bdf","name":"Boqin Yuan","hidden":false},{"_id":"6a248a27e4c258a029491be0","name":"Ruiqi Zhang","hidden":false},{"_id":"6a248a27e4c258a029491be1","name":"Yifan Zhang","hidden":false},{"_id":"6a248a27e4c258a029491be2","name":"Zibo Zhao","hidden":false},{"_id":"6a248a27e4c258a029491be3","name":"Liana","hidden":false},{"_id":"6a248a27e4c258a029491be4","name":"Santanu Bosu Antu","hidden":false},{"_id":"6a248a27e4c258a029491be5","name":"Haoyue Bai","hidden":false},{"_id":"6a248a27e4c258a029491be6","name":"Carlo Bosio","hidden":false},{"_id":"6a248a27e4c258a029491be7","name":"Joseph Cavanagh","hidden":false},{"_id":"6a248a27e4c258a029491be8","name":"Patricia Cavazos-Rehg","hidden":false},{"_id":"6a248a27e4c258a029491be9","name":"Tianxing Chen","hidden":false},{"_id":"6a248a27e4c258a029491bea","name":"Xuewen Chen","hidden":false},{"_id":"6a248a27e4c258a029491beb","name":"Yipu Chen","hidden":false},{"_id":"6a248a27e4c258a029491bec","name":"Zhu Chenyu","hidden":false},{"_id":"6a248a27e4c258a029491bed","name":"Chen Dai","hidden":false},{"_id":"6a248a27e4c258a029491bee","name":"Stefano De Castro","hidden":false},{"_id":"6a248a27e4c258a029491bef","name":"Yunfu Deng","hidden":false},{"_id":"6a248a27e4c258a029491bf0","name":"Kaustubh Dhole","hidden":false},{"_id":"6a248a27e4c258a029491bf1","name":"Jiayuan Ding","hidden":false},{"_id":"6a248a27e4c258a029491bf2","name":"Chenchen Du","hidden":false},{"_id":"6a248a27e4c258a029491bf3","name":"Zhehang Du","hidden":false},{"_id":"6a248a27e4c258a029491bf4","name":"Hao Fan","hidden":false},{"_id":"6a248a27e4c258a029491bf5","name":"Run-ze Fan","hidden":false},{"_id":"6a248a27e4c258a029491bf6","name":"Hengyu Fu","hidden":false},{"_id":"6a248a27e4c258a029491bf7","name":"Shi Gu","hidden":false},{"_id":"6a248a27e4c258a029491bf8","name":"Yifan Gu","hidden":false},{"_id":"6a248a27e4c258a029491bf9","name":"Charlie Guo","hidden":false},{"_id":"6a248a27e4c258a029491bfa","name":"Baihe Huang","hidden":false},{"_id":"6a248a27e4c258a029491bfb","name":"Baixiang Huang","hidden":false},{"_id":"6a248a27e4c258a029491bfc","name":"Rimika Jaiswal","hidden":false},{"_id":"6a248a27e4c258a029491bfd","name":"Zhihan Jiang","hidden":false},{"_id":"6a248a27e4c258a029491bfe","name":"Ran Jin","hidden":false},{"_id":"6a248a27e4c258a029491bff","name":"Erin Kasson","hidden":false},{"_id":"6a248a27e4c258a029491c00","name":"Xin Lan","hidden":false},{"_id":"6a248a27e4c258a029491c01","name":"Joseph Lee","hidden":false},{"_id":"6a248a27e4c258a029491c02","name":"Deren Lei","hidden":false},{"_id":"6a248a27e4c258a029491c03","name":"Chenyu Li","hidden":false},{"_id":"6a248a27e4c258a029491c04","name":"Daofeng Li","hidden":false},{"_id":"6a248a27e4c258a029491c05","name":"Haitao Li","hidden":false},{"_id":"6a248a27e4c258a029491c06","name":"Hongwei Li","hidden":false},{"_id":"6a248a27e4c258a029491c07","name":"Jingyan Li","hidden":false},{"_id":"6a248a27e4c258a029491c08","name":"Xiao Li","hidden":false},{"_id":"6a248a27e4c258a029491c09","name":"Yi Li","hidden":false},{"_id":"6a248a27e4c258a029491c0a","name":"Yinsheng Li","hidden":false},{"_id":"6a248a27e4c258a029491c0b","user":{"_id":"66f90dfb771c70a4b307c51e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66f90dfb771c70a4b307c51e/AgIm0EJd9YC6wKp8brvok.jpeg","isPro":false,"fullname":"Ken Li","user":"kendx","type":"user","name":"kendx"},"name":"Yuangang Li","status":"claimed_verified","statusLastChangedAt":"2026-06-08T09:52:47.626Z","hidden":false},{"_id":"6a248a27e4c258a029491c0c","name":"Zhixu Li","hidden":false},{"_id":"6a248a27e4c258a029491c0d","name":"Wenyu Liang","hidden":false},{"_id":"6a248a27e4c258a029491c0e","name":"Longtai Liao","hidden":false},{"_id":"6a248a27e4c258a029491c0f","name":"Kevin Qinghong Lin","hidden":false},{"_id":"6a248a27e4c258a029491c10","name":"AndyZeyi Liu","hidden":false},{"_id":"6a248a27e4c258a029491c11","name":"Che Liu","hidden":false},{"_id":"6a248a27e4c258a029491c12","name":"Jiaming Liu","hidden":false},{"_id":"6a248a27e4c258a029491c13","name":"Kaiyuan Liu","hidden":false},{"_id":"6a248a27e4c258a029491c14","name":"Xuan Liu","hidden":false},{"_id":"6a248a27e4c258a029491c15","name":"Pan Lu","hidden":false},{"_id":"6a248a27e4c258a029491c16","name":"Wenbo Lv","hidden":false},{"_id":"6a248a27e4c258a029491c17","name":"Yicheng Lv","hidden":false},{"_id":"6a248a27e4c258a029491c18","name":"Qiuyang Mang","hidden":false},{"_id":"6a248a27e4c258a029491c19","name":"Kyle Montgomery","hidden":false},{"_id":"6a248a27e4c258a029491c1a","name":"Yuzhou Nie","hidden":false},{"_id":"6a248a27e4c258a029491c1b","name":"Ruoxi Ning","hidden":false},{"_id":"6a248a27e4c258a029491c1c","name":"Jorin Overwiening","hidden":false},{"_id":"6a248a27e4c258a029491c1d","name":"Xu Pan","hidden":false},{"_id":"6a248a27e4c258a029491c1e","name":"Layna Paraboschi","hidden":false},{"_id":"6a248a27e4c258a029491c1f","name":"Core Francisco Park","hidden":false},{"_id":"6a248a27e4c258a029491c20","name":"Justin Purnomo","hidden":false},{"_id":"6a248a27e4c258a029491c21","name":"Swati Rajwal","hidden":false},{"_id":"6a248a27e4c258a029491c22","name":"Scott Rankin","hidden":false},{"_id":"6a248a27e4c258a029491c23","name":"Bixuan Ren","hidden":false},{"_id":"6a248a27e4c258a029491c24","name":"Yiren Rong","hidden":false},{"_id":"6a248a27e4c258a029491c25","name":"HaoYang Shang","hidden":false},{"_id":"6a248a27e4c258a029491c26","name":"Ventus Shaw","hidden":false},{"_id":"6a248a27e4c258a029491c27","name":"Fiona Shen","hidden":false},{"_id":"6a248a27e4c258a029491c28","name":"Jiawei Shen","hidden":false},{"_id":"6a248a27e4c258a029491c29","name":"Minqi Shi","hidden":false},{"_id":"6a248a27e4c258a029491c2a","name":"Qiu Shi","hidden":false},{"_id":"6a248a27e4c258a029491c2b","name":"Huaxiu Yao","hidden":false},{"_id":"6a248a27e4c258a029491c2c","name":"Tianneng Shi","hidden":false},{"_id":"6a248a27e4c258a029491c2d","name":"Jonah So","hidden":false},{"_id":"6a248a27e4c258a029491c2e","name":"Vladislav Susoy","hidden":false},{"_id":"6a248a27e4c258a029491c2f","name":"Hannah Szlyk","hidden":false},{"_id":"6a248a27e4c258a029491c30","name":"Haocheng Wang","hidden":false},{"_id":"6a248a27e4c258a029491c31","name":"Jialu Wang","hidden":false},{"_id":"6a248a27e4c258a029491c32","name":"Wei Wang","hidden":false},{"_id":"6a248a27e4c258a029491c33","name":"Xinyu Wang","hidden":false},{"_id":"6a248a27e4c258a029491c34","name":"Zehao Wang","hidden":false},{"_id":"6a248a27e4c258a029491c35","name":"Dowling Wong","hidden":false},{"_id":"6a248a27e4c258a029491c36","name":"Angela Wu","hidden":false},{"_id":"6a248a27e4c258a029491c37","name":"Dehao Wu","hidden":false},{"_id":"6a248a27e4c258a029491c38","name":"Fangyu Wu","hidden":false},{"_id":"6a248a27e4c258a029491c39","name":"Mengyuan \"Millie\" Wu","hidden":false},{"_id":"6a248a27e4c258a029491c3a","name":"Yu Wu","hidden":false},{"_id":"6a248a27e4c258a029491c3b","name":"Yuchen Wu","hidden":false},{"_id":"6a248a27e4c258a029491c3c","name":"Yuhao Wu","hidden":false},{"_id":"6a248a27e4c258a029491c3d","name":"Qingpo Wuwu","hidden":false},{"_id":"6a248a27e4c258a029491c3e","name":"Weihang Xiao","hidden":false},{"_id":"6a248a27e4c258a029491c3f","name":"Yongyi Xiong","hidden":false},{"_id":"6a248a27e4c258a029491c40","name":"Fan Xu","hidden":false},{"_id":"6a248a27e4c258a029491c41","name":"Ruiling Xu","hidden":false},{"_id":"6a248a27e4c258a029491c42","name":"Mingxuan Yan","hidden":false},{"_id":"6a248a27e4c258a029491c43","name":"Benjamin Yang","hidden":false},{"_id":"6a248a27e4c258a029491c44","name":"Jirong Yang","hidden":false},{"_id":"6a248a27e4c258a029491c45","name":"Sen Yang","hidden":false},{"_id":"6a248a27e4c258a029491c46","name":"Xiaoli Yang","hidden":false},{"_id":"6a248a27e4c258a029491c47","name":"Yushi Yang","hidden":false},{"_id":"6a248a27e4c258a029491c48","name":"Haoran Ye","hidden":false},{"_id":"6a248a27e4c258a029491c49","name":"Xiaohu Yu","hidden":false},{"_id":"6a248a27e4c258a029491c4a","name":"Zhengming Yu","hidden":false},{"_id":"6a248a27e4c258a029491c4b","name":"Chenlong Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c4c","name":"Chi Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c4d","name":"Hanning Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c4e","name":"Hanwen Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c4f","name":"Junge Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c50","name":"Kunpeng Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c51","name":"Song Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c52","name":"Wenjin Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c53","name":"Wenshuo Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c54","name":"Ying Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c55","name":"Yizhi Zhang","hidden":false},{"_id":"6a248a27e4c258a029491c56","name":"Brian Zhao","hidden":false},{"_id":"6a248a27e4c258a029491c57","name":"Qijian Zhao","hidden":false},{"_id":"6a248a27e4c258a029491c58","name":"Yimin Zhao","hidden":false},{"_id":"6a248a27e4c258a029491c59","name":"Yuhaohua Zheng","hidden":false},{"_id":"6a248a27e4c258a029491c5a","name":"Liwei Zhou","hidden":false},{"_id":"6a248a27e4c258a029491c5b","name":"Tianyue Zhou","hidden":false},{"_id":"6a248a27e4c258a029491c5c","name":"Sichen Zhu","hidden":false},{"_id":"6a248a27e4c258a029491c5d","name":"Siqi Zhu","hidden":false},{"_id":"6a248a27e4c258a029491c5e","name":"Yan Zhu","hidden":false},{"_id":"6a248a27e4c258a029491c5f","name":"Yishu Zhu","hidden":false},{"_id":"6a248a27e4c258a029491c60","name":"Jierui Zuo","hidden":false},{"_id":"6a248a27e4c258a029491c61","name":"Chonghao Cai","hidden":false},{"_id":"6a248a27e4c258a029491c62","name":"Helena Casademunt","hidden":false},{"_id":"6a248a27e4c258a029491c63","name":"Wenjia Chen","hidden":false},{"_id":"6a248a27e4c258a029491c64","name":"Benjamin Cheng","hidden":false},{"_id":"6a248a27e4c258a029491c65","name":"Nawen Deng","hidden":false},{"_id":"6a248a27e4c258a029491c66","name":"Rao Fu","hidden":false},{"_id":"6a248a27e4c258a029491c67","name":"Tianfu Fu","hidden":false},{"_id":"6a248a27e4c258a029491c68","name":"Yifan Han","hidden":false},{"_id":"6a248a27e4c258a029491c69","name":"Ren He","hidden":false},{"_id":"6a248a27e4c258a029491c6a","name":"Zhenyu He","hidden":false},{"_id":"6a248a27e4c258a029491c6b","name":"Qiao Jin","hidden":false},{"_id":"6a248a27e4c258a029491c6c","name":"Lang Lang","hidden":false},{"_id":"6a248a27e4c258a029491c6d","name":"Yuetai Li","hidden":false},{"_id":"6a248a27e4c258a029491c6e","name":"Sylvia Liu","hidden":false},{"_id":"6a248a27e4c258a029491c6f","name":"Lu Lu","hidden":false},{"_id":"6a248a27e4c258a029491c70","name":"Qing Lu","hidden":false},{"_id":"6a248a27e4c258a029491c71","name":"Subhabrata Mukherjee","hidden":false},{"_id":"6a248a27e4c258a029491c72","name":"Yunqi Ouyang","hidden":false},{"_id":"6a248a27e4c258a029491c73","name":"Yin Ren","hidden":false},{"_id":"6a248a27e4c258a029491c74","name":"Dawei Shi","hidden":false},{"_id":"6a248a27e4c258a029491c75","name":"Haoran Wu","hidden":false},{"_id":"6a248a27e4c258a029491c76","name":"Zhiyue Wu","hidden":false},{"_id":"6a248a27e4c258a029491c77","name":"Hannah Yao","hidden":false},{"_id":"6a248a27e4c258a029491c78","name":"Zhuoran Yi","hidden":false},{"_id":"6a248a27e4c258a029491c79","name":"Jenny Yu","hidden":false},{"_id":"6a248a27e4c258a029491c7a","name":"Rhea Zhan","hidden":false},{"_id":"6a248a27e4c258a029491c7b","name":"Hang Zhou","hidden":false},{"_id":"6a248a27e4c258a029491c7c","name":"Blake Zhu","hidden":false},{"_id":"6a248a27e4c258a029491c7d","name":"Junfan Zhu","hidden":false},{"_id":"6a248a27e4c258a029491c7e","name":"Alan Yuille","hidden":false},{"_id":"6a248a27e4c258a029491c7f","name":"Yang Liu","hidden":false},{"_id":"6a248a27e4c258a029491c80","name":"Russell Alan Poldrack","hidden":false},{"_id":"6a248a27e4c258a029491c81","name":"Jiachen Li","hidden":false},{"_id":"6a248a27e4c258a029491c82","name":"Zhenglu Li","hidden":false},{"_id":"6a248a27e4c258a029491c83","name":"Molei Tao","hidden":false},{"_id":"6a248a27e4c258a029491c84","name":"Jing Huang","hidden":false},{"_id":"6a248a27e4c258a029491c85","name":"Wenqi Shi","hidden":false},{"_id":"6a248a27e4c258a029491c86","name":"Costas Spanos","hidden":false},{"_id":"6a248a27e4c258a029491c87","name":"Lichao Sun","hidden":false},{"_id":"6a248a27e4c258a029491c88","name":"Chenguang Wang","hidden":false},{"_id":"6a248a27e4c258a029491c89","name":"Orson Xu","hidden":false},{"_id":"6a248a27e4c258a029491c8a","name":"Zhen Dong","hidden":false},{"_id":"6a248a27e4c258a029491c8b","name":"Hector Gomez","hidden":false},{"_id":"6a248a27e4c258a029491c8c","name":"Aylin Caliskan","hidden":false},{"_id":"6a248a27e4c258a029491c8d","name":"Ali Emami","hidden":false},{"_id":"6a248a27e4c258a029491c8e","name":"Haimin Hu","hidden":false},{"_id":"6a248a27e4c258a029491c8f","name":"Zhi Li","hidden":false},{"_id":"6a248a27e4c258a029491c90","name":"Lihui Liu","hidden":false},{"_id":"6a248a27e4c258a029491c91","name":"Murphy Niu","hidden":false},{"_id":"6a248a27e4c258a029491c92","name":"Yi Shao","hidden":false},{"_id":"6a248a27e4c258a029491c93","name":"Jianxin Sun","hidden":false},{"_id":"6a248a27e4c258a029491c94","name":"Mikko Tolonen","hidden":false},{"_id":"6a248a27e4c258a029491c95","name":"Ting Wang","hidden":false},{"_id":"6a248a27e4c258a029491c96","name":"Sanjiv Das","hidden":false},{"_id":"6a248a27e4c258a029491c97","name":"Yanjun Gao","hidden":false},{"_id":"6a248a27e4c258a029491c98","name":"Wenbo Guo","hidden":false},{"_id":"6a248a27e4c258a029491c99","name":"Erika J Schneider","hidden":false},{"_id":"6a248a27e4c258a029491c9a","name":"Zhiyong Lu","hidden":false},{"_id":"6a248a27e4c258a029491c9b","name":"Mark Mueller","hidden":false},{"_id":"6a248a27e4c258a029491c9c","name":"Radha Poovendran","hidden":false},{"_id":"6a248a27e4c258a029491c9d","name":"Somayeh Sojoudi","hidden":false},{"_id":"6a248a27e4c258a029491c9e","name":"Dawn Song","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/68e820313ebb7d1516e1face/23t2J9eYpC7dan2CHfHeU.png"],"publishedAt":"2026-06-03T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Agents' Last Exam","submittedOnDailyBy":{"_id":"68e820313ebb7d1516e1face","avatarUrl":"/avatars/25e6884825b4dee395b235f4b1b5c764.svg","isPro":false,"fullname":"Han","user":"XinyangDavidHan","type":"user","name":"XinyangDavidHan"},"summary":"Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widely used benchmarks lack sustained performance measurement on real and economically valuable workflows. This paper introduces Agents' Last Exam (ALE), a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. Developed in collaboration with 250+ industry experts, ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy). It is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks. Current results show that the hardest tier remains far from saturated: across mainstream harness and backbone configurations, the average full pass rate is 2.6%. ALE is designed as a living benchmark: its task pool grows continuously as new workflows and industries are onboarded. More broadly, ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark success and GDP-relevant impact.","upvotes":33,"discussionId":"6a248a27e4c258a029491c9f","projectPage":"https://agents-last-exam.org/","githubRepo":"https://github.com/rdi-berkeley/agents-last-exam","githubRepoAddedBy":"user","ai_summary":"Agents' Last Exam (ALE) is a benchmark for evaluating AI agents on long-term, economically valuable real-world tasks across 13 industry clusters with 1K+ tasks, revealing significant gaps between benchmark performance and practical deployment.","ai_keywords":["AI agents","benchmark","real-world tasks","economic value","task taxonomy","industry clusters","O*NET","SOC 2018","full pass rate","living benchmark"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":183,"organization":{"_id":"61f20a9ce108f2cba2dc0730","name":"Berkeley","fullname":"UC Berkeley","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/0FjsTg2txEZZ4dEgmMnQL.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64834b399b352597e41816ac","avatarUrl":"/avatars/63d9d123bffa90f43186a0bdc4455cbd.svg","isPro":false,"fullname":"Shaobai Jiang","user":"shaobaij","type":"user"},{"_id":"614439ea45eb5ebbe998fa9b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/614439ea45eb5ebbe998fa9b/f3-E1brzED1RmH_qznJGA.png","isPro":false,"fullname":"YY","user":"BigXYZ","type":"user"},{"_id":"65bc543cd30db1c18dff85cf","avatarUrl":"/avatars/de4396687f90d0957ba0a2c68ae80a83.svg","isPro":false,"fullname":"shuolu","user":"shuolucs","type":"user"},{"_id":"68e820313ebb7d1516e1face","avatarUrl":"/avatars/25e6884825b4dee395b235f4b1b5c764.svg","isPro":false,"fullname":"Han","user":"XinyangDavidHan","type":"user"},{"_id":"64440be5af034cdfd69ca3a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64440be5af034cdfd69ca3a7/qmx24QiDFT29vleCxL9TX.jpeg","isPro":false,"fullname":"Qinghong (Kevin) Lin","user":"KevinQHLin","type":"user"},{"_id":"658229ef5f6d83438257fce5","avatarUrl":"/avatars/b4417de9a338e95dc69cc547a46348e8.svg","isPro":false,"fullname":"Chris Liu","user":"chrisliu298","type":"user"},{"_id":"6582486355a1e6cdb3b5e19d","avatarUrl":"/avatars/80d8319c98b641dbc7bcd5a530b986b7.svg","isPro":false,"fullname":"Yiyou Sun","user":"sunyiyou","type":"user"},{"_id":"60f5f68fa7fd83d025749234","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60f5f68fa7fd83d025749234/gCeJAZfzaANAcEvI6v5-P.jpeg","isPro":false,"fullname":"Pan Lu","user":"lupantech","type":"user"},{"_id":"67ea2f93a01eb899bc7800e0","avatarUrl":"/avatars/ef66e8e92a94f33eb5694838a3f1c313.svg","isPro":false,"fullname":"Junbo Peng","user":"pjbustc","type":"user"},{"_id":"65cae89119683f9817c049ea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65cae89119683f9817c049ea/A0XxjmaJldu28JhFvWmpP.jpeg","isPro":false,"fullname":"Wenqi Shi","user":"wshi83","type":"user"},{"_id":"699dc9bf019084a18a35989d","avatarUrl":"/avatars/4c4587aeca266e69a498420487f6076f.svg","isPro":false,"fullname":"Weihang Xiao","user":"WeihangXiao","type":"user"},{"_id":"683133b8b85218bb0f25ac73","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/LDmwcooz2snuCWsm0Zmmr.png","isPro":false,"fullname":"Brian Zhao","user":"notbraining","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"61f20a9ce108f2cba2dc0730","name":"Berkeley","fullname":"UC Berkeley","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/0FjsTg2txEZZ4dEgmMnQL.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.05405.md"}">
Papers
arxiv:2606.05405

Agents' Last Exam

Published on Jun 3
· Submitted by
Han
on Jun 9
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Agents' Last Exam (ALE) is a benchmark for evaluating AI agents on long-term, economically valuable real-world tasks across 13 industry clusters with 1K+ tasks, revealing significant gaps between benchmark performance and practical deployment.

Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widely used benchmarks lack sustained performance measurement on real and economically valuable workflows. This paper introduces Agents' Last Exam (ALE), a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. Developed in collaboration with 250+ industry experts, ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy). It is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks. Current results show that the hardest tier remains far from saturated: across mainstream harness and backbone configurations, the average full pass rate is 2.6%. ALE is designed as a living benchmark: its task pool grows continuously as new workflows and industries are onboarded. More broadly, ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark success and GDP-relevant impact.

Community

Paper author Paper submitter about 2 hours ago

Agents' Last Exam (ALE): can AI agents genuinely do the work of human experts in real-world settings?

A living benchmark built with 300+ experts across 55 industries, yielding 1,500+ real-world tasks. Three things set it apart:

  1. Real origins: every task comes from actual projects experts completed on the job, mapped to the U.S. occupational taxonomy (O*NET).
  2. Unconstrained: generalist computer-use agents get full GUI + CLI and solve tasks however they want, judged on results, not method.
  3. Objective: scored by reproducible, deterministic code evaluators, with no human judge.

Frontier agents pass only 2.6% on the hardest "last-exam" tier, a sobering reality check on the timeline for AI workplace automation. We call it the "Last Exam" as saturating it means agents can genuinely power real industries.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.05405
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.05405 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.05405 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.05405 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers