Hi everyone, thanks for checking out our paper!</p>\n<p>In QG-MIL, we study the attention concentration problem in medical Multiple Instance Learning, where attention-based aggregators can collapse onto a small subset of instances and produce unstable or overconfident predictions.</p>\n<p>Our goal was to design a simple drop-in MIL aggregator that mitigates this behavior architecturally, without auxiliary losses, masking strategies, or multi-stage training. QG-MIL combines RMSNorm pre-normalization, per-head QK normalization, attention-output gating, and SwiGLU-style feed-forward layers, and we evaluate it across pathology and hematology benchmarks with different bag sizes and feature extractors.</p>\n<p>Happy to discuss the method, limitations, ablations, or possible extensions!</p>\n","updatedAt":"2026-06-24T09:16:43.901Z","author":{"_id":"67bed13027ea9754039d63c3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/JHlZXuwcY0y_estgKPkPT.jpeg","fullname":"Luca Zedda","name":"Snarcy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8614324927330017},"editors":["Snarcy"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/JHlZXuwcY0y_estgKPkPT.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.20027","authors":[{"_id":"6a3b9f735ac9fb074498492f","name":"Luca Zedda","hidden":false},{"_id":"6a3b9f735ac9fb0744984930","name":"Davide Antonio Mura","hidden":false},{"_id":"6a3b9f735ac9fb0744984931","name":"Cecilia Di Ruberto","hidden":false},{"_id":"6a3b9f735ac9fb0744984932","name":"Maurizio Atzori","hidden":false},{"_id":"6a3b9f735ac9fb0744984933","name":"Muhammed Furkan Dasdelen","hidden":false},{"_id":"6a3b9f735ac9fb0744984934","name":"Carsten Marr","hidden":false},{"_id":"6a3b9f735ac9fb0744984935","name":"Andrea Loddo","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/67bed13027ea9754039d63c3/T7vDCscoBN3IkPA3ZaP0p.png"],"publishedAt":"2026-06-18T00:00:00.000Z","submittedOnDailyAt":"2026-06-24T00:00:00.000Z","title":"QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging","submittedOnDailyBy":{"_id":"67bed13027ea9754039d63c3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/JHlZXuwcY0y_estgKPkPT.jpeg","isPro":false,"fullname":"Luca Zedda","user":"Snarcy","type":"user","name":"Snarcy"},"summary":"Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL","upvotes":2,"discussionId":"6a3b9f745ac9fb0744984936","githubRepo":"https://github.com/unica-visual-intelligence-lab/QG-MIL","githubRepoAddedBy":"user","ai_summary":"QG-MIL introduces a gated transformer aggregator for multiple instance learning in medical imaging that stabilizes attention distribution and improves prediction consistency across different medical domains.","ai_keywords":["Attention-based Multiple Instance Learning","gated transformer aggregator","RMSNorm-based pre-normalization","per-head QK normalization","fine-grained attention output gating","SwiGLU-style feed-forward modules","attention concentration","overconfident predictions","unstable predictions","medical imaging","whole-slide pathology","cell-level hematology","MIL scales","mean macro F1","attention overlays","attention mass analysis","ablation studies","cross-domain performance"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67bed13027ea9754039d63c3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/JHlZXuwcY0y_estgKPkPT.jpeg","isPro":false,"fullname":"Luca Zedda","user":"Snarcy","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.20027.md","query":{}}">
QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging
Abstract
QG-MIL introduces a gated transformer aggregator for multiple instance learning in medical imaging that stabilizes attention distribution and improves prediction consistency across different medical domains.
Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL
Community
Hi everyone, thanks for checking out our paper!
In QG-MIL, we study the attention concentration problem in medical Multiple Instance Learning, where attention-based aggregators can collapse onto a small subset of instances and produce unstable or overconfident predictions.
Our goal was to design a simple drop-in MIL aggregator that mitigates this behavior architecturally, without auxiliary losses, masking strategies, or multi-stage training. QG-MIL combines RMSNorm pre-normalization, per-head QK normalization, attention-output gating, and SwiGLU-style feed-forward layers, and we evaluate it across pathology and hematology benchmarks with different bag sizes and feature extractors.
Happy to discuss the method, limitations, ablations, or possible extensions!
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.20027 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.20027 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.20027 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.