Hugging Face Daily Papers · · 3 min read

NSF-SciFy: Mining the NSF Awards Database for Scientific Claims

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

To be presented at ACL 2026 main (oral).</p>\n","updatedAt":"2026-05-27T15:14:19.601Z","author":{"_id":"60d3ab1507da9c17c7270917","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d3ab1507da9c17c7270917/x5FxIakR-okI5Csd1Sg7Q.png","fullname":"Delip Rao","name":"delip","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6451870799064636},"editors":["delip"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/60d3ab1507da9c17c7270917/x5FxIakR-okI5Csd1Sg7Q.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.08600","authors":[{"_id":"6a170a17da9422d403a421d0","name":"Delip Rao","hidden":false},{"_id":"6a170a17da9422d403a421d1","name":"Weiqiu You","hidden":false},{"_id":"6a170a17da9422d403a421d2","name":"Eric Wong","hidden":false},{"_id":"6a170a17da9422d403a421d3","name":"Chris Callison-Burch","hidden":false}],"publishedAt":"2026-05-25T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"NSF-SciFy: Mining the NSF Awards Database for Scientific Claims","submittedOnDailyBy":{"_id":"60d3ab1507da9c17c7270917","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d3ab1507da9c17c7270917/x5FxIakR-okI5Csd1Sg7Q.png","isPro":false,"fullname":"Delip Rao","user":"delip","type":"user","name":"delip"},"summary":"We introduce NSF-SciFy, a comprehensive dataset of scientific claims and investigation proposals extracted from National Science Foundation award abstracts. While previous scientific claim verification datasets have been limited in size and scope, NSF-SciFy represents a significant advance with 2.8 million claims from 400,000 abstracts spanning all science and mathematics disciplines. We present two focused subsets: NSF-SciFy-MatSci with 114,000 claims from materials science awards, and NSF-SciFy-20K with 135,000 claims across five NSF directorates. Using zero-shot prompting, we develop a scalable approach for joint extraction of scientific claims and investigation proposals. We demonstrate the dataset's utility through three downstream tasks: non-technical abstract generation, claim extraction, and investigation proposal extraction. Fine-tuning language models on our dataset yields substantial improvements, with relative gains often exceeding 100%, particularly for claim and proposal extraction tasks. Our error analysis reveals that extracted claims exhibit high precision but lower recall, suggesting opportunities for further methodological refinement. NSF-SciFy enables new research directions in large-scale claim verification, scientific discovery tracking, and meta-scientific analysis. Code and data are available at https://github.com/darpa-scify/NSFSciFy.","upvotes":1,"discussionId":"6a170a18da9422d403a421d4","githubRepo":"https://github.com/darpa-scify/NSFSciFy","githubRepoAddedBy":"user","ai_summary":"NSF-SciFy is a large-scale dataset of scientific claims and investigation proposals extracted from NSF award abstracts, enabling improved language model fine-tuning for claim verification and scientific discovery tracking.","ai_keywords":["scientific claims","investigation proposals","zero-shot prompting","language models","fine-tuning"],"githubStars":0},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69a3f6ce54551aa754f60e98","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/LpJ1YD4Bwcaa-GJuKL1tI.png","isPro":false,"fullname":"Павлов Роман","user":"tangqianyi","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2503/2503.08600.md"}">
Papers
arxiv:2503.08600

NSF-SciFy: Mining the NSF Awards Database for Scientific Claims

Published on May 25
· Submitted by
Delip Rao
on May 27
Authors:
,
,
,

Abstract

NSF-SciFy is a large-scale dataset of scientific claims and investigation proposals extracted from NSF award abstracts, enabling improved language model fine-tuning for claim verification and scientific discovery tracking.

AI-generated summary

We introduce NSF-SciFy, a comprehensive dataset of scientific claims and investigation proposals extracted from National Science Foundation award abstracts. While previous scientific claim verification datasets have been limited in size and scope, NSF-SciFy represents a significant advance with 2.8 million claims from 400,000 abstracts spanning all science and mathematics disciplines. We present two focused subsets: NSF-SciFy-MatSci with 114,000 claims from materials science awards, and NSF-SciFy-20K with 135,000 claims across five NSF directorates. Using zero-shot prompting, we develop a scalable approach for joint extraction of scientific claims and investigation proposals. We demonstrate the dataset's utility through three downstream tasks: non-technical abstract generation, claim extraction, and investigation proposal extraction. Fine-tuning language models on our dataset yields substantial improvements, with relative gains often exceeding 100%, particularly for claim and proposal extraction tasks. Our error analysis reveals that extracted claims exhibit high precision but lower recall, suggesting opportunities for further methodological refinement. NSF-SciFy enables new research directions in large-scale claim verification, scientific discovery tracking, and meta-scientific analysis. Code and data are available at https://github.com/darpa-scify/NSFSciFy.

Community

Paper submitter about 10 hours ago

To be presented at ACL 2026 main (oral).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2503.08600
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.08600 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.08600 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.08600 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers