r/LocalLLaMA · · 1 min read

Unlimited-OCR is now on ModelScope! A 3.3B multilingual OCR model for one-shot parsing across single images, multi-page documents, and PDFs. License: MIT

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Unlimited-OCR is now on ModelScope! A 3.3B multilingual OCR model for one-shot parsing across single images, multi-page documents, and PDFs. License: MIT

Full-document parsing instead of cropped-region OCR

32K output length for long OCR sequences

Base and gundam image modes for different document layouts

Transformers inference + SGLang serving with OpenAI-compatible streaming requests

Built to push DeepSeek-OCR-style document parsing further.

source: https://x.com/ModelScope2022/status/2069335055965491525

https://github.com/baidu/Unlimited-OCR

submitted by /u/Sporeboss
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA