Hugging Face Daily Papers · May 14, 2026 · 4 min read

FeatCal: Feature Calibration for Post-Merging Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Model merging combines task experts into a single model, but the merged model can still underperform the experts. FeatCal studies this gap through feature drift: the difference between features produced by the merged model and by the task expert on the same input. It then calibrates the merged model layer by layer in forward order using a small calibration set.</p>\n","updatedAt":"2026-05-14T11:09:08.825Z","author":{"_id":"662bba49bed98acbe616d37d","avatarUrl":"/avatars/f70ded35f371ec0d10249d4248d3cea1.svg","fullname":"yanggangu","name":"yanggangu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9402239918708801},"editors":["yanggangu"],"editorAvatarUrls":["/avatars/f70ded35f371ec0d10249d4248d3cea1.svg"],"reactions":[{"reaction":"🚀","users":["baicaihaochi121"],"count":1},{"reaction":"🔥","users":["baicaihaochi121"],"count":1},{"reaction":"❤️","users":["baicaihaochi121"],"count":1},{"reaction":"🤝","users":["baicaihaochi121"],"count":1},{"reaction":"👍","users":["baicaihaochi121"],"count":1},{"reaction":"👀","users":["baicaihaochi121"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.13030","authors":[{"_id":"6a05390fb1a8cbabc9f08775","user":{"_id":"662bba49bed98acbe616d37d","avatarUrl":"/avatars/f70ded35f371ec0d10249d4248d3cea1.svg","isPro":false,"fullname":"yanggangu","user":"yanggangu","type":"user","name":"yanggangu"},"name":"Yanggan Gu","status":"claimed_verified","statusLastChangedAt":"2026-05-14T10:55:32.738Z","hidden":false},{"_id":"6a05390fb1a8cbabc9f08776","user":{"_id":"6716057376da0cd1a8aaeae1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/HhSgYitikt9GgK_hZnVUB.png","isPro":false,"fullname":"Shuo CAI","user":"baicaihaochi121","type":"user","name":"baicaihaochi121"},"name":"Shuo Cai","status":"claimed_verified","statusLastChangedAt":"2026-05-14T10:55:29.125Z","hidden":false},{"_id":"6a05390fb1a8cbabc9f08777","name":"Zihao Wang","hidden":false},{"_id":"6a05390fb1a8cbabc9f08778","name":"Wenjun Wang","hidden":false},{"_id":"6a05390fb1a8cbabc9f08779","name":"Yuanyi Wang","hidden":false},{"_id":"6a05390fb1a8cbabc9f0877a","name":"Pengkai Wang","hidden":false},{"_id":"6a05390fb1a8cbabc9f0877b","name":"Sirui Huang","hidden":false},{"_id":"6a05390fb1a8cbabc9f0877c","name":"Su Lu","hidden":false},{"_id":"6a05390fb1a8cbabc9f0877d","name":"Jianmin Wu","hidden":false},{"_id":"6a05390fb1a8cbabc9f0877e","name":"Hongxia Yang","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"FeatCal: Feature Calibration for Post-Merging Models","submittedOnDailyBy":{"_id":"662bba49bed98acbe616d37d","avatarUrl":"/avatars/f70ded35f371ec0d10249d4248d3cea1.svg","isPro":false,"fullname":"yanggangu","user":"yanggangu","type":"user","name":"yanggangu"},"summary":"Model merging combines task experts into one model and avoids joint training, retraining, or deploying many expert models, but the merged model often still underperforms task experts. We study this performance gap through feature drift, the difference between features produced by the merged model and by the expert on the same input. Our theory decomposes this drift into upstream propagation and local mismatch, tracks how it propagates and combines through later layers in forward order, and links final feature drift to output drift. This view motivates FeatCal, which uses a small calibration set to calibrate the merged model weights layer by layer in forward order, reducing feature drift while staying close to merged weights and preserving the benefits of model merging. FeatCal uses an efficient closed-form solution to update model weights, with no gradient descent, iterative optimization, or extra modules. On the main CLIP and GLUE benchmarks, FeatCal beats Surgery and ProbSurgery, the closest post-merging calibration baselines: 85.5% vs. 77.0%/78.8% on CLIP-ViT-B/32 Task Arithmetic (TA) and 85.2% vs. 83.7%/82.2% on FLAN-T5-base GLUE. On CLIP-ViT-B/32, 8 examples per task reach 82.9%, and 256 examples per task take 53 seconds, about 4x faster than both baselines, showing better sample efficiency and lower calibration cost.","upvotes":4,"discussionId":"6a05390fb1a8cbabc9f0877f","projectPage":"https://github.com/egangu/featcal","githubRepo":"https://github.com/egangu/featcal","githubRepoAddedBy":"user","ai_summary":"Feature drift analysis in model merging leads to FeatCal, a calibration method that reduces performance gaps through layer-wise weight updates without gradient descent, achieving superior benchmark results and efficiency.","ai_keywords":["model merging","feature drift","task experts","forward order","calibration set","closed-form solution","model weights","benchmark performance","sample efficiency","calibration cost"],"githubStars":2,"organization":{"_id":"646ecc368d316fde87b3b6e3","name":"PolyUHK","fullname":"The Hong Kong Polytechnic University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/646ecbc0cbb7bb996513e298/Akb4zKqIP9kb9PQoUPUmj.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"662bba49bed98acbe616d37d","avatarUrl":"/avatars/f70ded35f371ec0d10249d4248d3cea1.svg","isPro":false,"fullname":"yanggangu","user":"yanggangu","type":"user"},{"_id":"6716057376da0cd1a8aaeae1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/HhSgYitikt9GgK_hZnVUB.png","isPro":false,"fullname":"Shuo CAI","user":"baicaihaochi121","type":"user"},{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":true,"fullname":"Urro","user":"urroxyz","type":"user"},{"_id":"68e840caa318194c44ec2a04","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e840caa318194c44ec2a04/5bsQZWRdMYqDE2y67A2KZ.jpeg","isPro":false,"fullname":"Naphula","user":"Naphula","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"646ecc368d316fde87b3b6e3","name":"PolyUHK","fullname":"The Hong Kong Polytechnic University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/646ecbc0cbb7bb996513e298/Akb4zKqIP9kb9PQoUPUmj.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.13030.md"}">

Papers

arxiv:2605.13030

FeatCal: Feature Calibration for Post-Merging Models

Published on May 13

· Submitted by

yanggangu on May 14

The Hong Kong Polytechnic University

Upvote

Authors:

Yanggan Gu ,

Shuo Cai ,

Abstract

Feature drift analysis in model merging leads to FeatCal, a calibration method that reduces performance gaps through layer-wise weight updates without gradient descent, achieving superior benchmark results and efficiency.

AI-generated summary

Model merging combines task experts into one model and avoids joint training, retraining, or deploying many expert models, but the merged model often still underperforms task experts. We study this performance gap through feature drift, the difference between features produced by the merged model and by the expert on the same input. Our theory decomposes this drift into upstream propagation and local mismatch, tracks how it propagates and combines through later layers in forward order, and links final feature drift to output drift. This view motivates FeatCal, which uses a small calibration set to calibrate the merged model weights layer by layer in forward order, reducing feature drift while staying close to merged weights and preserving the benefits of model merging. FeatCal uses an efficient closed-form solution to update model weights, with no gradient descent, iterative optimization, or extra modules. On the main CLIP and GLUE benchmarks, FeatCal beats Surgery and ProbSurgery, the closest post-merging calibration baselines: 85.5% vs. 77.0%/78.8% on CLIP-ViT-B/32 Task Arithmetic (TA) and 85.2% vs. 83.7%/82.2% on FLAN-T5-base GLUE. On CLIP-ViT-B/32, 8 examples per task reach 82.9%, and 256 examples per task take 53 seconds, about 4x faster than both baselines, showing better sample efficiency and lower calibration cost.

View arXiv page View PDF Project page GitHub 2 Add to collection

Community

yanggangu

Paper author Paper submitter about 15 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.13030

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.13030 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.13030 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.13030 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

FeatCal: Feature Calibration for Post-Merging Models

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers