Hugging Face Daily Papers · · 3 min read

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

<a href=\"https://arxiv.org/pdf/2606.03264\" rel=\"nofollow\">https://arxiv.org/pdf/2606.03264</a></p>\n","updatedAt":"2026-06-03T06:24:52.896Z","author":{"_id":"65a5231a087d8a2e9cc2414b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65a5231a087d8a2e9cc2414b/wj0l5R5LmBUG-E8XTMdBM.jpeg","fullname":"cuicheng","name":"ChengCui","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":26,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.43829408288002014},"editors":["ChengCui"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65a5231a087d8a2e9cc2414b/wj0l5R5LmBUG-E8XTMdBM.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03264","authors":[{"_id":"6a1fc6bae292c1c78ecb1598","name":"Zelun Zhang","hidden":false},{"_id":"6a1fc6bae292c1c78ecb1599","name":"Hongen Liu","hidden":false},{"_id":"6a1fc6bae292c1c78ecb159a","name":"Suyin Liang","hidden":false},{"_id":"6a1fc6bae292c1c78ecb159b","name":"Yubo Zhang","hidden":false},{"_id":"6a1fc6bae292c1c78ecb159c","name":"Yiqing Xiang","hidden":false},{"_id":"6a1fc6bae292c1c78ecb159d","name":"Jiaxuan Liu","hidden":false},{"_id":"6a1fc6bae292c1c78ecb159e","name":"Ting Sun","hidden":false},{"_id":"6a1fc6bae292c1c78ecb159f","name":"Manhui Lin","hidden":false},{"_id":"6a1fc6bae292c1c78ecb15a0","name":"Yue Zhang","hidden":false},{"_id":"6a1fc6bae292c1c78ecb15a1","name":"Changda Zhou","hidden":false},{"_id":"6a1fc6bae292c1c78ecb15a2","name":"Tingquan Gao","hidden":false},{"_id":"6a1fc6bae292c1c78ecb15a3","name":"Cheng Cui","hidden":false},{"_id":"6a1fc6bae292c1c78ecb15a4","name":"Yi Liu","hidden":false},{"_id":"6a1fc6bae292c1c78ecb15a5","name":"Dianhai Yu","hidden":false},{"_id":"6a1fc6bae292c1c78ecb15a6","name":"Yanjun Ma","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-03T00:00:00.000Z","title":"PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training","submittedOnDailyBy":{"_id":"65a5231a087d8a2e9cc2414b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65a5231a087d8a2e9cc2414b/wj0l5R5LmBUG-E8XTMdBM.jpeg","isPro":false,"fullname":"cuicheng","user":"ChengCui","type":"user","name":"ChengCui"},"summary":"We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.","upvotes":5,"discussionId":"6a1fc6bbe292c1c78ecb15a7","projectPage":"https://www.paddleocr.com","githubRepo":"https://github.com/PaddlePaddle/PaddleOCR","githubRepoAddedBy":"user","ai_summary":"PaddleOCR-VL-1.6 enhances document parsing performance through targeted data optimization and progressive post-training techniques, achieving state-of-the-art results on OmniDocBench v1.6.","ai_keywords":["document parsing","data optimization","post-training","reinforcement learning","VLMs","OmniDocBench"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":79408,"organization":{"_id":"62067d5d3906f102bc9658bd","name":"PaddlePaddle","fullname":"PaddlePaddle","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654942635336-5f3ff69679c1ba4c353d0c5a.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65a5231a087d8a2e9cc2414b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65a5231a087d8a2e9cc2414b/wj0l5R5LmBUG-E8XTMdBM.jpeg","isPro":false,"fullname":"cuicheng","user":"ChengCui","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"67d706ff39ec109dc55a079c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67d706ff39ec109dc55a079c/ljX7prhev7HcxVWIzs7IW.jpeg","isPro":false,"fullname":"MR Hajesmaeili","user":"mohajesmaeili","type":"user"},{"_id":"663c077c5eee63f75d8d036b","avatarUrl":"/avatars/7a157b07af60c4c6208bf254da72a8ed.svg","isPro":false,"fullname":"Nikolay Tynyanov","user":"tynyanov","type":"user"},{"_id":"687aead190285c6f5f82ed1a","avatarUrl":"/avatars/923dba724dddbc56fcc3d138e1b12eff.svg","isPro":false,"fullname":"asadiyan","user":"amir1334r","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"62067d5d3906f102bc9658bd","name":"PaddlePaddle","fullname":"PaddlePaddle","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654942635336-5f3ff69679c1ba4c353d0c5a.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03264.md"}">
Papers
arxiv:2606.03264

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Published on Jun 2
· Submitted by
cuicheng
on Jun 3
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

PaddleOCR-VL-1.6 enhances document parsing performance through targeted data optimization and progressive post-training techniques, achieving state-of-the-art results on OmniDocBench v1.6.

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.03264
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.03264 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers