Hugging Face Daily Papers · June 18, 2026 · 4 min read

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Al for Science has produced powerful models for proteins, molecules, materials, reactions, and other scientific objects. Every new task often requires a new model, a new training pipeline, and a new set of assumptions. Knowledge stays siloed. This raises a question: Can scientific data be modeled like language, through a shared generative framework? To investigate this, we introduce LOGOS, a general-purpose generative foundation model for the natural sciences.</p>\n","updatedAt":"2026-06-18T00:12:27.677Z","author":{"_id":"67ac2166ac8a6496920601c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ac2166ac8a6496920601c7/2mcb_si3ZcsO2aYXLubhw.jpeg","fullname":"Yurou Liu","name":"lyr1ssr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8927604556083679},"editors":["lyr1ssr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/67ac2166ac8a6496920601c7/2mcb_si3ZcsO2aYXLubhw.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.16905","authors":[{"_id":"6a30d3eba0d4daae428601c7","name":"Mingyang Li","hidden":false},{"_id":"6a30d3eba0d4daae428601c8","name":"Yurou Liu","hidden":false},{"_id":"6a30d3eba0d4daae428601c9","name":"Jieping Ye","hidden":false},{"_id":"6a30d3eba0d4daae428601ca","name":"Bing Su","hidden":false},{"_id":"6a30d3eba0d4daae428601cb","name":"Ji-Rong Wen","hidden":false},{"_id":"6a30d3eba0d4daae428601cc","name":"Zheng Wang","hidden":false}],"publishedAt":"2026-06-15T00:00:00.000Z","submittedOnDailyAt":"2026-06-17T00:00:00.000Z","title":"Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences","submittedOnDailyBy":{"_id":"67ac2166ac8a6496920601c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ac2166ac8a6496920601c7/2mcb_si3ZcsO2aYXLubhw.jpeg","isPro":false,"fullname":"Yurou Liu","user":"lyr1ssr","type":"user","name":"lyr1ssr"},"summary":"In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interactions as token sequences over a common vocabulary. By representing spatial contact and constraint patterns as discrete tokens, the model captures complex structural interactions in a purely sequential manner, without relying on explicit coordinates or geometric neural networks. This unified representation enables a wide range of downstream tasks to be formulated consistently as next-token prediction in the same grammar space, creating strong alignment between continued multi-domain pre-training and downstream objectives. Across diverse tasks, LOGOS consistently matches or outperforms domain-specific baselines, providing preliminary evidence for the feasibility of \"one model fits all\" in the natural sciences. We train LOGOS models at different scales (1B, 3B, and 8B parameters) and find a consistent positive correlation between model size and performance. This suggests that the future of AI for Science (AI4S) may not lie in building an independent technical stack that is separated from large language models (LLMs). Instead, it may depend on deeply aligning scientific foundation models with LLMs through shared architectures, shared training paradigms, and shared inference infrastructure, so that LLMs can truly become a new entry point for AI4S. We release the model weights and associated resources to facilitate further research.","upvotes":1,"discussionId":"6a30d3eba0d4daae428601cd","githubRepo":"https://github.com/LOGOS-Hub/LOGOS","githubRepoAddedBy":"user","ai_summary":"A unified scientific generative language model encodes diverse scientific objects and spatial interactions as token sequences, demonstrating strong performance across multiple domains through autoregressive next-token prediction.","ai_keywords":["autoregressive framework","scientific grammar","token sequences","next-token prediction","scientific foundation models","large language models","AI for Science"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":19,"organization":{"_id":"6a2e922c000819df312d3fa0","name":"LOGOS-Hub","fullname":"LOGOS-Hub","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6548b25d36332b0eede05f45/_nIKRrqEHA64WYniR13bz.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6679538551d04a43c3b4f496","avatarUrl":"/avatars/c1ed9beb093addbf2e10b7cec659675b.svg","isPro":false,"fullname":"VinceAyin","user":"VinceAyin","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6a2e922c000819df312d3fa0","name":"LOGOS-Hub","fullname":"LOGOS-Hub","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6548b25d36332b0eede05f45/_nIKRrqEHA64WYniR13bz.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.16905.md","query":{}}">

Papers

arxiv:2606.16905

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Published on Jun 15

· Submitted by

Yurou Liu on Jun 17

LOGOS-Hub

Upvote

Authors:

Abstract

A unified scientific generative language model encodes diverse scientific objects and spatial interactions as token sequences, demonstrating strong performance across multiple domains through autoregressive next-token prediction.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interactions as token sequences over a common vocabulary. By representing spatial contact and constraint patterns as discrete tokens, the model captures complex structural interactions in a purely sequential manner, without relying on explicit coordinates or geometric neural networks. This unified representation enables a wide range of downstream tasks to be formulated consistently as next-token prediction in the same grammar space, creating strong alignment between continued multi-domain pre-training and downstream objectives. Across diverse tasks, LOGOS consistently matches or outperforms domain-specific baselines, providing preliminary evidence for the feasibility of "one model fits all" in the natural sciences. We train LOGOS models at different scales (1B, 3B, and 8B parameters) and find a consistent positive correlation between model size and performance. This suggests that the future of AI for Science (AI4S) may not lie in building an independent technical stack that is separated from large language models (LLMs). Instead, it may depend on deeply aligning scientific foundation models with LLMs through shared architectures, shared training paradigms, and shared inference infrastructure, so that LLMs can truly become a new entry point for AI4S. We release the model weights and associated resources to facilitate further research.

View arXiv page View PDF GitHub 19 Add to collection

Community

lyr1ssr

Paper submitter about 1 hour ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.16905

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.16905 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.16905 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Abstract

Community

Models citing this paper 4

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers