Hugging Face Daily Papers · June 5, 2026 · 3 min read

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

#model-release #reasoning #paper #benchmark

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

VideoKR presents a large-scale video reasoning dataset and benchmark designed to enhance knowledge-intensive video understanding through expert-domain content and human-in-the-loop example generation.</p>\n","updatedAt":"2026-06-05T02:14:34.401Z","author":{"_id":"64dc29d9b5d625e0e9a6ecb9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/QxGBsnk1cNsBEPqSx4ae-.jpeg","fullname":"Tingyu Song","name":"songtingyu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8097061514854431},"editors":["songtingyu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/QxGBsnk1cNsBEPqSx4ae-.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.05259","authors":[{"_id":"6a2229023490a593e87b13a3","name":"Lin Fu","hidden":false},{"_id":"6a2229023490a593e87b13a4","name":"Zheyuan Yang","hidden":false},{"_id":"6a2229023490a593e87b13a5","name":"Yang Wang","hidden":false},{"_id":"6a2229023490a593e87b13a6","name":"Tingyu Song","hidden":false},{"_id":"6a2229023490a593e87b13a7","name":"Arman Cohan","hidden":false},{"_id":"6a2229023490a593e87b13a8","name":"Yilun Zhao","hidden":false}],"publishedAt":"2026-06-03T00:00:00.000Z","submittedOnDailyAt":"2026-06-05T00:00:00.000Z","title":"VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding","submittedOnDailyBy":{"_id":"64dc29d9b5d625e0e9a6ecb9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/QxGBsnk1cNsBEPqSx4ae-.jpeg","isPro":false,"fullname":"Tingyu Song","user":"songtingyu","type":"user","name":"songtingyu"},"summary":"We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and reasoning-intensive video understanding. It comprises 315K video reasoning examples over 145K newly collected, CC-licensed, expert-domain videos. We develop a human-in-the-loop, skill-oriented example generation pipeline that targets progressively deeper video reasoning capabilities while ensuring the difficulty, diversity, and reliability of both the examples and their CoT rationales. We also curate VideoKR-Eval, a new expert-annotated benchmark where questions require genuine video understanding and knowledge-intensive reasoning rather than textual shortcuts. Our experiments show that, under a standard SFTrightarrowGRPO pipeline, models post-trained on VideoKR outperform prior post-training approaches on knowledge-intensive video reasoning while remaining competitive on general video reasoning, highlighting data design as a key driver of progress in video reasoning. We further conduct comprehensive ablations to isolate the contributions of VideoKR, providing actionable insights for future work.","upvotes":25,"discussionId":"6a2229033490a593e87b13a9","githubRepo":"https://github.com/Fu-Fu-Fu-Fu/VideoKR","githubRepoAddedBy":"user","ai_summary":"VideoKR presents a large-scale video reasoning dataset and benchmark designed to enhance knowledge-intensive video understanding through expert-domain content and human-in-the-loop example generation.","ai_keywords":["video reasoning","knowledge-intensive video understanding","large-scale training corpus","human-in-the-loop","skill-oriented example generation","CoT rationales","expert-annotated benchmark","SFT$\\rightarrow$GRPO pipeline","post-training approaches"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":9},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"68084d54aca60e6178b3afb5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68084d54aca60e6178b3afb5/TshN3Ka3VRFD_I3WJ6Vys.jpeg","isPro":false,"fullname":"Lin Fu","user":"minuzero","type":"user"},{"_id":"679185119afe88fb031405e1","avatarUrl":"/avatars/aac8d1a818bfa9ee09cf982cf1d724b3.svg","isPro":false,"fullname":"Lily","user":"chenyingli","type":"user"},{"_id":"6a222b71ec50aeea6f4fc8be","avatarUrl":"/avatars/ea05463516b9cd7d1d67a9cf9fcda67f.svg","isPro":false,"fullname":"HHH","user":"HZKKKKK","type":"user"},{"_id":"64dc29d9b5d625e0e9a6ecb9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/QxGBsnk1cNsBEPqSx4ae-.jpeg","isPro":false,"fullname":"Tingyu Song","user":"songtingyu","type":"user"},{"_id":"683c642b02c1a474a867964e","avatarUrl":"/avatars/63e44a9cf788ee7b3ad236407700ceca.svg","isPro":false,"fullname":"Jinbiao Wei","user":"mikeweii","type":"user"},{"_id":"62f662bcc58915315c4eccea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f662bcc58915315c4eccea/zOAQLONfMP88zr70sxHK-.jpeg","isPro":true,"fullname":"Yilun Zhao","user":"yilunzhao","type":"user"},{"_id":"66af69222f4c59963afc874f","avatarUrl":"/avatars/034ca7688282bdbeddbd4f03e54dead7.svg","isPro":false,"fullname":"Zheyuan Yang","user":"Raywithyou","type":"user"},{"_id":"691fd7494f13a38aef6feba5","avatarUrl":"/avatars/462f59d3f309a6ba0ac679a51de12ec8.svg","isPro":false,"fullname":"runzhe","user":"run31","type":"user"},{"_id":"6652eb1895bc4a27bab1ba01","avatarUrl":"/avatars/b30265e589ba6b0f1382bdce46bf975f.svg","isPro":false,"fullname":"kelatte","user":"kelatte","type":"user"},{"_id":"65bda9874b5f8c270de11440","avatarUrl":"/avatars/353f33c198752a634b8e6a422aa8008d.svg","isPro":false,"fullname":"Huang Jie","user":"JadeHuang","type":"user"},{"_id":"65dfeee3d16fb170031df293","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65dfeee3d16fb170031df293/2VbNuqcpN3XrWB18NfzRQ.jpeg","isPro":false,"fullname":"gan","user":"guo9","type":"user"},{"_id":"672a13251a61a228e15f84dd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/9ujCMZYcaTqbRlEQIYNA3.png","isPro":false,"fullname":"Lee","user":"Joffeeustc","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.05259.md"}">

Papers

arxiv:2606.05259

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Published on Jun 3

· Submitted by

Tingyu Song on Jun 5

Upvote

Authors:

Abstract

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and reasoning-intensive video understanding. It comprises 315K video reasoning examples over 145K newly collected, CC-licensed, expert-domain videos. We develop a human-in-the-loop, skill-oriented example generation pipeline that targets progressively deeper video reasoning capabilities while ensuring the difficulty, diversity, and reliability of both the examples and their CoT rationales. We also curate VideoKR-Eval, a new expert-annotated benchmark where questions require genuine video understanding and knowledge-intensive reasoning rather than textual shortcuts. Our experiments show that, under a standard SFTrightarrowGRPO pipeline, models post-trained on VideoKR outperform prior post-training approaches on knowledge-intensive video reasoning while remaining competitive on general video reasoning, highlighting data design as a key driver of progress in video reasoning. We further conduct comprehensive ablations to isolate the contributions of VideoKR, providing actionable insights for future work.