r/LocalLLaMA · June 5, 2026 · 1 min read

model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp

Model Summary: Granite Vision 4.1 4B is a vision-language model (VLM) that delivers frontier-level performance on structured document extraction tasks — chart extraction, table extraction, and semantic key-value pair extraction — in a compact 4B parameter footprint, providing a lightweight alternative to much larger frontier models for these tasks:

Chart extraction: Converting charts into structured, machine-readable formats (Chart2CSV, Chart2Summary, and Chart2Code)
Table extraction: Accurately extracting tables with complex layouts from document images to JSON, HTML, or OTSL
Semantic Key-Value Pair (KVP) extraction: Extracting values based on key names and descriptions across diverse document layouts

submitted by /u/jacek2023
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA