Hugging Face Daily Papers · · 6 min read

ACL-Verbatim: hallucination-free question answering for research

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

𝗧𝗼𝗱𝗮𝘆 𝘄𝗲 𝗮𝗿𝗲 𝗿𝗲𝗹𝗲𝗮𝘀𝗶𝗻𝗴 𝗮 𝗻𝗲𝘄 𝗳𝗮𝗺𝗶𝗹𝘆 𝗼𝗳 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁 𝗦𝗢𝗧𝗔 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝘃𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗳𝗼𝗿 𝗴𝗿𝗼𝘂𝗻𝗱𝗲𝗱 𝗥𝗔𝗚.</p>\n<p>Two 𝟭𝟱𝟬𝗠-𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 ModernBERT span extractors trained as token-classifiers. They 𝗯𝗲𝗮𝘁 public extractive baselines (Zilliz Semantic Highlight, Provence) across ACL, RAGBench, Squeez, and QASPER, and outperform LLM-based extractors 100x their size on our ACL-Verbatim benchmark.</p>\n<p>Given a query and a retrieved chunk, the extractor returns the exact text spans that support the answer.</p>\n<p>Rather than generating an answer with an LLM, you get verbatim evidence directly from the source: paragraphs, table captions, code blocks, or other relevant text.</p>\n","updatedAt":"2026-06-02T13:22:14.741Z","author":{"_id":"646264832538819c729e32ba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646264832538819c729e32ba/syc-UpPQyR3Nbf-gYndc4.jpeg","fullname":"Adam Kovacs","name":"adaamko","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":25,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.758830726146698},"editors":["adaamko"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/646264832538819c729e32ba/syc-UpPQyR3Nbf-gYndc4.jpeg"],"reactions":[],"isReport":false}},{"id":"6a1f8a47b47e980051377ac2","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false},"createdAt":"2026-06-03T01:58:31.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [AstroRAG -- A Pagerank-Based Retrieval-Augmented Generation Pipeline for Question Answering in Astronomy](https://huggingface.co/papers/2605.25039) (2026)\n* [Fine-grained Claim-level RAG Benchmark for Law](https://huggingface.co/papers/2605.21071) (2026)\n* [Empirical Evaluation of PDF Parsing and Chunking for Financial Question Answering with RAG](https://huggingface.co/papers/2604.12047) (2026)\n* [A Comparative Study of Language Models for Khmer Retrieval-Augmented Question Answering](https://huggingface.co/papers/2605.22099) (2026)\n* [OCC-RAG: Optimal Cognitive Core for Faithful Question Answering](https://huggingface.co/papers/2606.00683) (2026)\n* [RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration](https://huggingface.co/papers/2604.15945) (2026)\n* [A multilingual hallucination benchmark: MultiWikiQHalluA](https://huggingface.co/papers/2605.02504) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.25039\">AstroRAG -- A Pagerank-Based Retrieval-Augmented Generation Pipeline for Question Answering in Astronomy</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.21071\">Fine-grained Claim-level RAG Benchmark for Law</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.12047\">Empirical Evaluation of PDF Parsing and Chunking for Financial Question Answering with RAG</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.22099\">A Comparative Study of Language Models for Khmer Retrieval-Augmented Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.00683\">OCC-RAG: Optimal Cognitive Core for Faithful Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.15945\">RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.02504\">A multilingual hallucination benchmark: MultiWikiQHalluA</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-06-03T01:58:31.545Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6989384889602661},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.21102","authors":[{"_id":"6a1ed7d8e292c1c78ecb108f","name":"Gábor Recski","hidden":false},{"_id":"6a1ed7d8e292c1c78ecb1090","name":"Szilveszter Tóth","hidden":false},{"_id":"6a1ed7d8e292c1c78ecb1091","name":"Nadia Verdha","hidden":false},{"_id":"6a1ed7d8e292c1c78ecb1092","name":"István Boros","hidden":false},{"_id":"6a1ed7d8e292c1c78ecb1093","name":"Ádám Kovács","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/646264832538819c729e32ba/VWSoZJ6AKpzDtllJ1eld_.png"],"publishedAt":"2026-05-20T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"ACL-Verbatim: hallucination-free question answering for research","submittedOnDailyBy":{"_id":"646264832538819c729e32ba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646264832538819c729e32ba/syc-UpPQyR3Nbf-gYndc4.jpeg","isPro":true,"fullname":"Adam Kovacs","user":"adaamko","type":"user","name":"adaamko"},"summary":"Academic researchers need efficient and reliable methods for collecting high-quality information from trusted sources, but modern tools for AI-assisted research still suffer from the tendency of Large Language Models (LLMs) to produce factually inaccurate or nonsensical output, commonly referred to as hallucinations. We apply the extractive question answering system VerbatimRAG to research papers in the ACL Anthology, directly mapping user queries to verbatim text spans in retrieved documents. We contribute a novel ground truth dataset for the task of mapping user queries to relevant text spans in research papers, and use it to train and evaluate a variety of extractive models. Human annotation is performed by NLP researchers and is based on synthetic user queries generated using a custom pipeline based on the ScIRGen methodology, paired with chunks of research papers retrieved by VerbatimRAG. On this benchmark, a 150M-parameter ModernBERT token classifier trained on silver supervision from our pipeline achieves the best word-level F1 (53.6), ahead of the strongest evaluated LLM extractor (48.7).","upvotes":3,"discussionId":"6a1ed7d9e292c1c78ecb1094","projectPage":"https://verbatim.krlabs.eu","githubRepo":"https://github.com/KRLabsOrg/acl-verbatim","githubRepoAddedBy":"user","ai_summary":"Researchers develop a VerbatimRAG-based extractive question answering system using a novel ground truth dataset and ModernBERT model to improve accurate information retrieval from research papers.","ai_keywords":["extractive question answering","VerbatimRAG","ACL Anthology","ground truth dataset","ModernBERT","token classifier","silver supervision","ScIRGen methodology"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":5,"organization":{"_id":"67955f5622b334a837888bfb","name":"KRLabsOrg","fullname":"KR Labs","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/646264832538819c729e32ba/t6EOByCCCk72LYmAhcL8V.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646264832538819c729e32ba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646264832538819c729e32ba/syc-UpPQyR3Nbf-gYndc4.jpeg","isPro":true,"fullname":"Adam Kovacs","user":"adaamko","type":"user"},{"_id":"69ef65a52598d26c75109a2d","avatarUrl":"/avatars/8f8b87ce12998ae2a31ee6d70a063c6f.svg","isPro":false,"fullname":"Gabor Recski","user":"recski","type":"user"},{"_id":"6a1f289d4bbc43b1e36c1ab3","avatarUrl":"/avatars/fa3a75e2d3433ff8a5d02a0662e29cc6.svg","isPro":false,"fullname":"Dongyi Sun","user":"jundo-0331","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67955f5622b334a837888bfb","name":"KRLabsOrg","fullname":"KR Labs","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/646264832538819c729e32ba/t6EOByCCCk72LYmAhcL8V.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.21102.md"}">
Papers
arxiv:2605.21102

ACL-Verbatim: hallucination-free question answering for research

Published on May 20
· Submitted by
Adam Kovacs
on Jun 2
Authors:
,
,
,
,

Abstract

Researchers develop a VerbatimRAG-based extractive question answering system using a novel ground truth dataset and ModernBERT model to improve accurate information retrieval from research papers.

Academic researchers need efficient and reliable methods for collecting high-quality information from trusted sources, but modern tools for AI-assisted research still suffer from the tendency of Large Language Models (LLMs) to produce factually inaccurate or nonsensical output, commonly referred to as hallucinations. We apply the extractive question answering system VerbatimRAG to research papers in the ACL Anthology, directly mapping user queries to verbatim text spans in retrieved documents. We contribute a novel ground truth dataset for the task of mapping user queries to relevant text spans in research papers, and use it to train and evaluate a variety of extractive models. Human annotation is performed by NLP researchers and is based on synthetic user queries generated using a custom pipeline based on the ScIRGen methodology, paired with chunks of research papers retrieved by VerbatimRAG. On this benchmark, a 150M-parameter ModernBERT token classifier trained on silver supervision from our pipeline achieves the best word-level F1 (53.6), ahead of the strongest evaluated LLM extractor (48.7).

Community

Paper submitter about 13 hours ago

𝗧𝗼𝗱𝗮𝘆 𝘄𝗲 𝗮𝗿𝗲 𝗿𝗲𝗹𝗲𝗮𝘀𝗶𝗻𝗴 𝗮 𝗻𝗲𝘄 𝗳𝗮𝗺𝗶𝗹𝘆 𝗼𝗳 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁 𝗦𝗢𝗧𝗔 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝘃𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗳𝗼𝗿 𝗴𝗿𝗼𝘂𝗻𝗱𝗲𝗱 𝗥𝗔𝗚.

Two 𝟭𝟱𝟬𝗠-𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 ModernBERT span extractors trained as token-classifiers. They 𝗯𝗲𝗮𝘁 public extractive baselines (Zilliz Semantic Highlight, Provence) across ACL, RAGBench, Squeez, and QASPER, and outperform LLM-based extractors 100x their size on our ACL-Verbatim benchmark.

Given a query and a retrieved chunk, the extractor returns the exact text spans that support the answer.

Rather than generating an answer with an LLM, you get verbatim evidence directly from the source: paragraphs, table captions, code blocks, or other relevant text.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.21102
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 2

Datasets citing this paper 3

Spaces citing this paper 1

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers