r/LocalLLaMA · · 1 min read

model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp

Model Summary: Granite Vision 4.1 4B is a vision-language model (VLM) that delivers frontier-level performance on structured document extraction tasks — chart extraction, table extraction, and semantic key-value pair extraction — in a compact 4B parameter footprint, providing a lightweight alternative to much larger frontier models for these tasks:

  • Chart extraction: Converting charts into structured, machine-readable formats (Chart2CSV, Chart2Summary, and Chart2Code)
  • Table extraction: Accurately extracting tables with complex layouts from document images to JSON, HTML, or OTSL
  • Semantic Key-Value Pair (KVP) extraction: Extracting values based on key names and descriptions across diverse document layouts
submitted by /u/jacek2023
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA