Hugging Face Daily Papers · May 15, 2026 · 4 min read

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

FINAL Bench introduces a new evaluation paradigm for LLMs: functional metacognitive reasoning — not just \"can the model solve it,\" but \"does the model know when, why, and how it solves it.\"\n<ul>\n<li>100 tasks across 15 domains, built on the TICOS framework (Task / Introspection / Calibration / Output / Self-correction)</li>\n<li>Already #5 globally on HF Datasets popularity</li>\n<li>Officially endorsed by the HF Evaluation Team (Nathan Habib)</li>\n</ul>\nWe believe metacognition is the missing axis in current LLM benchmarks. Feedback welcome.\n","updatedAt":"2026-05-15T02:44:24.838Z","author":{"_id":"63c3550d8cc87cf0c06838e7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/H5ncB4vaBtP8GVCidgxL0.png","fullname":"seawolf","name":"seawolf2357","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":356,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8766076564788818},"editors":["seawolf2357"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/H5ncB4vaBtP8GVCidgxL0.png"],"reactions":[],"isReport":false}},{"id":"6a069c61b798ba08027d2f63","author":{"_id":"6905bc786cb49b1f11d32728","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/504I_H8J9NkiytiHvbx-h.jpeg","fullname":"VIDRAFT_LAB","name":"SeaWolf-AI","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":163,"isUserFollowing":false},"createdAt":"2026-05-15T04:09:05.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Darwin Family — Architecture Overview\n\n![Darwin Family Diagram](https://huggingface.co/FINAL-Bench/Darwin-36B-Opus/resolve/main/DARWIN.png)\n\nFlagship update: Darwin-36B-Opus achieves 88.4% on GPQA Diamond,\nmatching Qwen3.5-397B-A17B with ~10× fewer params, training-free.\n\n- Model: https://huggingface.co/FINAL-Bench/Darwin-36B-Opus\n- Paper: https://arxiv.org/abs/2605.14386","html":"Darwin Family — Architecture Overview\n<a href=\"https://huggingface.co/FINAL-Bench/Darwin-36B-Opus/resolve/main/DARWIN.png\" rel=\"nofollow\"><img src=\"https://huggingface.co/FINAL-Bench/Darwin-36B-Opus/resolve/main/DARWIN.png\" alt=\"Darwin Family Diagram\"></a>\nFlagship update: Darwin-36B-Opus achieves 88.4% on GPQA Diamond, matching Qwen3.5-397B-A17B with ~10× fewer params, training-free.\n<ul>\n<li>Model: <a href=\"https://huggingface.co/FINAL-Bench/Darwin-36B-Opus\">https://huggingface.co/FINAL-Bench/Darwin-36B-Opus</a></li>\n<li>Paper: <a href=\"https://arxiv.org/abs/2605.14386\" rel=\"nofollow\">https://arxiv.org/abs/2605.14386</a></li>\n</ul>\n","updatedAt":"2026-05-15T04:09:05.821Z","author":{"_id":"6905bc786cb49b1f11d32728","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/504I_H8J9NkiytiHvbx-h.jpeg","fullname":"VIDRAFT_LAB","name":"SeaWolf-AI","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":163,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5746477246284485},"editors":["SeaWolf-AI"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/504I_H8J9NkiytiHvbx-h.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14386","authors":[{"_id":"6a067e15b1a8cbabc9f09846","name":"Taebong Kim","hidden":false},{"_id":"6a067e15b1a8cbabc9f09847","name":"Youngsik Hong","hidden":false},{"_id":"6a067e15b1a8cbabc9f09848","name":"Minsik Kim","hidden":false},{"_id":"6a067e15b1a8cbabc9f09849","name":"Sunyoung Choi","hidden":false},{"_id":"6a067e15b1a8cbabc9f0984a","name":"Jaewon Jang","hidden":false},{"_id":"6a067e15b1a8cbabc9f0984b","name":"Junghoon Shin","hidden":false},{"_id":"6a067e15b1a8cbabc9f0984c","name":"Minseo Kim","hidden":false}],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning","submittedOnDailyBy":{"_id":"63c3550d8cc87cf0c06838e7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/H5ncB4vaBtP8GVCidgxL0.png","isPro":true,"fullname":"seawolf","user":"seawolf2357","type":"user","name":"seawolf2357"},"summary":"We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.","upvotes":43,"discussionId":"6a067e15b1a8cbabc9f0984d","projectPage":"https://vidraft.net","ai_summary":"The Darwin Family framework enables training-free evolutionary merging of large language models through gradient-free weight-space recombination, achieving superior reasoning performance without additional training.","ai_keywords":["evolutionary merging","gradient-free weight-space recombination","merge genome","MRI-Trust Fusion","trust parameter","Architecture Mapper","cross-architecture breeding","Transformer-based components","Mamba-based components","reasoning performance"],"organization":{"_id":"699976ab4a856643b7429675","name":"FINAL-Bench","fullname":"FINAL_Bench","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6905bc786cb49b1f11d32728/VZmuKH-liifeL2GCXlwka.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6905bc786cb49b1f11d32728","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/504I_H8J9NkiytiHvbx-h.jpeg","isPro":false,"fullname":"VIDRAFT_LAB","user":"SeaWolf-AI","type":"user"},{"_id":"696f2edfa0417065e6a7c3ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/JAcuD_km-TtlyBV6_tLGu.png","isPro":true,"fullname":"Proto_AGI","user":"mayafree","type":"user"},{"_id":"681393732839acbf52102040","avatarUrl":"/avatars/754d2a48673b2587535eca5e33cd0a7e.svg","isPro":false,"fullname":"beans of sloar","user":"solarbeams","type":"user"},{"_id":"680e55ad443976539201fa04","avatarUrl":"/avatars/0069747f479e315aa1b9cf10651b8450.svg","isPro":false,"fullname":"saint marzi","user":"ausntmarzi","type":"user"},{"_id":"68c1e68cdfe1636ab69e1845","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/5x92L_Zvrp4vF_X1Lm4Au.png","isPro":false,"fullname":"Young Sik Hong","user":"RICHARDYHONG","type":"user"},{"_id":"679225a67802e6cf055a8043","avatarUrl":"/avatars/36f734890ad8766abc55c2cbec900884.svg","isPro":false,"fullname":"Minseo Kim","user":"MinseoKim-03","type":"user"},{"_id":"69998706ac17ddf87e499c17","avatarUrl":"/avatars/a83cd98f53f7fc5a47b128a617da4be6.svg","isPro":true,"fullname":"Jinki Jeong","user":"Anserwise","type":"user"},{"_id":"69eeba359124f7337b516d5f","avatarUrl":"/avatars/c3fac1fbb5c08b90ca5143a864317dba.svg","isPro":false,"fullname":"Warecube Korea","user":"Warecube","type":"user"},{"_id":"6607a1cfc50f8393c5744b02","avatarUrl":"/avatars/c86de5d9de3b39757c72e2b5b79d1838.svg","isPro":false,"fullname":"99","user":"cutechicken","type":"user"},{"_id":"685e137743b7d143a1fa73ea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/685e137743b7d143a1fa73ea/tSfTbpy59Yk4X83V34ndh.jpeg","isPro":false,"fullname":"JangJaewon","user":"Be2Jay","type":"user"},{"_id":"6894268a467f7d2f5f3b093f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/QyqI5V_2V1D_96iiYeOpL.png","isPro":false,"fullname":"Eunjin","user":"ej329","type":"user"},{"_id":"6999ae7c3ed53ea88e14f687","avatarUrl":"/avatars/2155bf6d1eed45e7bb41f02c995412c0.svg","isPro":false,"fullname":"Jang","user":"jaytoone","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"699976ab4a856643b7429675","name":"FINAL-Bench","fullname":"FINAL_Bench","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6905bc786cb49b1f11d32728/VZmuKH-liifeL2GCXlwka.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14386.md"}">

Papers

arxiv:2605.14386

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Published on May 14

· Submitted by

seawolf on May 15

FINAL_Bench

Upvote

Authors:

Abstract

The Darwin Family framework enables training-free evolutionary merging of large language models through gradient-free weight-space recombination, achieving superior reasoning performance without additional training.

AI-generated summary

We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.

View arXiv page View PDF Project page Add to collection

Community

seawolf2357

Paper submitter about 22 hours ago

FINAL Bench introduces a new evaluation paradigm for LLMs:
functional metacognitive reasoning — not just "can the model solve it,"
but "does the model know when, why, and how it solves it."

100 tasks across 15 domains, built on the TICOS framework
(Task / Introspection / Calibration / Output / Self-correction)
Already #5 globally on HF Datasets popularity
Officially endorsed by the HF Evaluation Team (Nathan Habib)

We believe metacognition is the missing axis in current LLM benchmarks.
Feedback welcome.