ChartArena is a comprehensive bilingual benchmark for evaluating the chart parsing capabilities of vision-language models, spanning the full difficulty spectrum of charts encountered in practice. It covers eight chart families: both numeric charts (bar, line, pie, radar, box plot, combination) and diagrammatic structures (flowchart, mind map), each presented across three visual scenarios (digital renderings, printed photos, and hand-drawn photos) and two languages (Chinese and English).</p>\n<p>To enable fair comparison across models that produce mutually incompatible output formats, ChartArena adopts a format-agnostic evaluation protocol: heterogeneous predictions are normalized into two canonical semantic spaces: a triple view for numeric charts and a directed graph view for diagrammatic charts, and scored with structure-aware metrics.</p>\n","updatedAt":"2026-06-02T07:10:28.475Z","author":{"_id":"66b6df9d512dac2ac07cf0c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/z0w4X-CdeEoFnwvgrsO9Z.jpeg","fullname":"Peng Shangpin","name":"psp-dada","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8443649411201477},"editors":["psp-dada"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/z0w4X-CdeEoFnwvgrsO9Z.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.01348","authors":[{"_id":"6a1e4e5b808ddbc3c7d43d09","name":"Shangpin Peng","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d0a","name":"Gengluo Li","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d0b","name":"Xingyu Wan","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d0c","name":"Chengquan Zhang","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d0d","name":"Hao Feng","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d0e","name":"Binghong Wu","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d0f","name":"Huawen Shen","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d10","name":"Weinong Wang","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d11","name":"Ziyi Cai","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d12","name":"Zhuotao Tian","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d13","name":"Han Hu","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d14","name":"Can Ma","hidden":false},{"_id":"6a1e4e5b808ddbc3c7d43d15","name":"Yu Zhou","hidden":false}],"publishedAt":"2026-05-31T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats","submittedOnDailyBy":{"_id":"66b6df9d512dac2ac07cf0c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/z0w4X-CdeEoFnwvgrsO9Z.jpeg","isPro":false,"fullname":"Peng Shangpin","user":"psp-dada","type":"user","name":"psp-dada"},"summary":"Charts are a primary medium for conveying quantitative and relational information, yet systematically evaluating chart parsing models remains difficult. Existing benchmarks focus on narrow chart types and leave diagrammatic structures such as flowcharts and mind maps largely unaddressed, while models produce outputs in incompatible formats, and datasets rarely include the printed or hand-drawn images encountered in practice. To address these issues, we introduce ChartArena, a comprehensive bilingual benchmark covering eight chart families spanning both numeric charts and diagrammatic structures, each evaluated across three visual scenarios: digital renderings, printed photos, and hand-drawn photos. The dataset is built via a human-agent collaborative annotation pipeline with multi-stage human verification to ensure annotation reliability. To enable fair cross-model comparison, we further design a format-agnostic evaluation protocol that maps heterogeneous outputs into two canonical semantic spaces, a normalized triple view and a directed graph view, and scores them with structure-aware metrics. Through extensive evaluation of 26 leading MLLMs, we observe three consistent findings: (i) frontier proprietary models such as Gemini 3.1 Pro lead overall, yet the strongest open-source systems are rapidly closing the gap; (ii) document parsing models handle numeric charts reasonably but fall sharply behind on diagrammatic structures; and (iii) expert chart parsers remain limited to narrow chart families. Across all models, radar charts and hand-drawn scenarios stay especially challenging. These findings show that ChartArena exposes clear capability gaps and provides a unified foundation for future progress. ChartArena is publicly available at https://github.com/pspdada/ChartArena.","upvotes":1,"discussionId":"6a1e4e5b808ddbc3c7d43d16","githubRepo":"https://github.com/pspdada/ChartArena","githubRepoAddedBy":"user","ai_summary":"ChartArena presents a comprehensive bilingual benchmark for chart parsing that evaluates models across diverse chart types and visual conditions while providing a unified evaluation framework for fair comparison.","ai_keywords":["chart parsing","MLLMs","visual scenarios","semantic spaces","structure-aware metrics","human-agent collaborative annotation","multi-stage human verification","format-agnostic evaluation","canonical semantic spaces","normalized triple view","directed graph view"],"githubStars":0},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6a15a16f9c525e486d076e70","avatarUrl":"/avatars/aef2b501b6a77832dba0b906ddce6106.svg","isPro":false,"fullname":"唐紫怡","user":"isabellan69","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.01348.md"}">
ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats
Authors: ,
,
,
,
,
,
,
,
,
,
,
,
Abstract
ChartArena presents a comprehensive bilingual benchmark for chart parsing that evaluates models across diverse chart types and visual conditions while providing a unified evaluation framework for fair comparison.
AI-generated summary
Charts are a primary medium for conveying quantitative and relational information, yet systematically evaluating chart parsing models remains difficult. Existing benchmarks focus on narrow chart types and leave diagrammatic structures such as flowcharts and mind maps largely unaddressed, while models produce outputs in incompatible formats, and datasets rarely include the printed or hand-drawn images encountered in practice. To address these issues, we introduce ChartArena, a comprehensive bilingual benchmark covering eight chart families spanning both numeric charts and diagrammatic structures, each evaluated across three visual scenarios: digital renderings, printed photos, and hand-drawn photos. The dataset is built via a human-agent collaborative annotation pipeline with multi-stage human verification to ensure annotation reliability. To enable fair cross-model comparison, we further design a format-agnostic evaluation protocol that maps heterogeneous outputs into two canonical semantic spaces, a normalized triple view and a directed graph view, and scores them with structure-aware metrics. Through extensive evaluation of 26 leading MLLMs, we observe three consistent findings: (i) frontier proprietary models such as Gemini 3.1 Pro lead overall, yet the strongest open-source systems are rapidly closing the gap; (ii) document parsing models handle numeric charts reasonably but fall sharply behind on diagrammatic structures; and (iii) expert chart parsers remain limited to narrow chart families. Across all models, radar charts and hand-drawn scenarios stay especially challenging. These findings show that ChartArena exposes clear capability gaps and provides a unified foundation for future progress. ChartArena is publicly available at https://github.com/pspdada/ChartArena.
Community
ChartArena is a comprehensive bilingual benchmark for evaluating the chart parsing capabilities of vision-language models, spanning the full difficulty spectrum of charts encountered in practice. It covers eight chart families: both numeric charts (bar, line, pie, radar, box plot, combination) and diagrammatic structures (flowchart, mind map), each presented across three visual scenarios (digital renderings, printed photos, and hand-drawn photos) and two languages (Chinese and English).
To enable fair comparison across models that produce mutually incompatible output formats, ChartArena adopts a format-agnostic evaluation protocol: heterogeneous predictions are normalized into two canonical semantic spaces: a triple view for numeric charts and a directed graph view for diagrammatic charts, and scored with structure-aware metrics.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.01348 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.01348 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.