Hugging Face Daily Papers · · 4 min read

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Al for Science has produced powerful models for proteins, molecules, materials, reactions, and other scientific objects. Every new task often requires a new model, a new training pipeline, and a new set of assumptions. Knowledge stays siloed. This raises a question: Can scientific data be modeled like language, through a shared generative framework? To investigate this, we introduce LOGOS, a general-purpose generative foundation model for the natural sciences.</p>\n","updatedAt":"2026-06-18T00:12:27.677Z","author":{"_id":"67ac2166ac8a6496920601c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ac2166ac8a6496920601c7/2mcb_si3ZcsO2aYXLubhw.jpeg","fullname":"Yurou Liu","name":"lyr1ssr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8927604556083679},"editors":["lyr1ssr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/67ac2166ac8a6496920601c7/2mcb_si3ZcsO2aYXLubhw.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.16905","authors":[{"_id":"6a30d3eba0d4daae428601c7","name":"Mingyang Li","hidden":false},{"_id":"6a30d3eba0d4daae428601c8","name":"Yurou Liu","hidden":false},{"_id":"6a30d3eba0d4daae428601c9","name":"Jieping Ye","hidden":false},{"_id":"6a30d3eba0d4daae428601ca","name":"Bing Su","hidden":false},{"_id":"6a30d3eba0d4daae428601cb","name":"Ji-Rong Wen","hidden":false},{"_id":"6a30d3eba0d4daae428601cc","name":"Zheng Wang","hidden":false}],"publishedAt":"2026-06-15T00:00:00.000Z","submittedOnDailyAt":"2026-06-17T00:00:00.000Z","title":"Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences","submittedOnDailyBy":{"_id":"67ac2166ac8a6496920601c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ac2166ac8a6496920601c7/2mcb_si3ZcsO2aYXLubhw.jpeg","isPro":false,"fullname":"Yurou Liu","user":"lyr1ssr","type":"user","name":"lyr1ssr"},"summary":"In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interactions as token sequences over a common vocabulary. By representing spatial contact and constraint patterns as discrete tokens, the model captures complex structural interactions in a purely sequential manner, without relying on explicit coordinates or geometric neural networks. This unified representation enables a wide range of downstream tasks to be formulated consistently as next-token prediction in the same grammar space, creating strong alignment between continued multi-domain pre-training and downstream objectives. Across diverse tasks, LOGOS consistently matches or outperforms domain-specific baselines, providing preliminary evidence for the feasibility of \"one model fits all\" in the natural sciences. We train LOGOS models at different scales (1B, 3B, and 8B parameters) and find a consistent positive correlation between model size and performance. This suggests that the future of AI for Science (AI4S) may not lie in building an independent technical stack that is separated from large language models (LLMs). Instead, it may depend on deeply aligning scientific foundation models with LLMs through shared architectures, shared training paradigms, and shared inference infrastructure, so that LLMs can truly become a new entry point for AI4S. We release the model weights and associated resources to facilitate further research.","upvotes":1,"discussionId":"6a30d3eba0d4daae428601cd","githubRepo":"https://github.com/LOGOS-Hub/LOGOS","githubRepoAddedBy":"user","ai_summary":"A unified scientific generative language model encodes diverse scientific objects and spatial interactions as token sequences, demonstrating strong performance across multiple domains through autoregressive next-token prediction.","ai_keywords":["autoregressive framework","scientific grammar","token sequences","next-token prediction","scientific foundation models","large language models","AI for Science"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":19,"organization":{"_id":"6a2e922c000819df312d3fa0","name":"LOGOS-Hub","fullname":"LOGOS-Hub","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6548b25d36332b0eede05f45/_nIKRrqEHA64WYniR13bz.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6679538551d04a43c3b4f496","avatarUrl":"/avatars/c1ed9beb093addbf2e10b7cec659675b.svg","isPro":false,"fullname":"VinceAyin","user":"VinceAyin","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6a2e922c000819df312d3fa0","name":"LOGOS-Hub","fullname":"LOGOS-Hub","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6548b25d36332b0eede05f45/_nIKRrqEHA64WYniR13bz.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.16905.md","query":{}}">
Papers
arxiv:2606.16905

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Published on Jun 15
· Submitted by
Yurou Liu
on Jun 17
Authors:
,
,
,
,
,

Abstract

A unified scientific generative language model encodes diverse scientific objects and spatial interactions as token sequences, demonstrating strong performance across multiple domains through autoregressive next-token prediction.

In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interactions as token sequences over a common vocabulary. By representing spatial contact and constraint patterns as discrete tokens, the model captures complex structural interactions in a purely sequential manner, without relying on explicit coordinates or geometric neural networks. This unified representation enables a wide range of downstream tasks to be formulated consistently as next-token prediction in the same grammar space, creating strong alignment between continued multi-domain pre-training and downstream objectives. Across diverse tasks, LOGOS consistently matches or outperforms domain-specific baselines, providing preliminary evidence for the feasibility of "one model fits all" in the natural sciences. We train LOGOS models at different scales (1B, 3B, and 8B parameters) and find a consistent positive correlation between model size and performance. This suggests that the future of AI for Science (AI4S) may not lie in building an independent technical stack that is separated from large language models (LLMs). Instead, it may depend on deeply aligning scientific foundation models with LLMs through shared architectures, shared training paradigms, and shared inference infrastructure, so that LLMs can truly become a new entry point for AI4S. We release the model weights and associated resources to facilitate further research.

Community

Paper submitter about 1 hour ago

Al for Science has produced powerful models for proteins, molecules, materials, reactions, and other scientific objects. Every new task often requires a new model, a new training pipeline, and a new set of assumptions. Knowledge stays siloed. This raises a question: Can scientific data be modeled like language, through a shared generative framework? To investigate this, we introduce LOGOS, a general-purpose generative foundation model for the natural sciences.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.16905
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.16905 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.16905 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers