SynIB: Informational Bottleneck for Maximizing Synergy in Multimodal Learning
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:SynIB: Informational Bottleneck for Maximizing Synergy in Multimodal Learning
Abstract:A central objective in multimodal learning is to capture synergy: task-relevant information that arises only from the joint use of multiple modalities, and is not available from any single modality alone. While most approaches operate at the architectural level through larger or more complex fusion models, we propose a complementary axis: shaping the training objective itself. Standard training often emphasizes unimodal or redundant information, falling short on examples that require cross-modal reasoning. We formalize multimodal synergy through information theory and introduce the Synergistic Information Bottleneck (SynIB), a scalable objective that targets synergy directly. To prioritize learning synergy, SynIB motivates the model to predict accurately from all modalities while penalizing confidence when information from any modality is withheld. Alongside the standard task loss, the model runs forward passes with one modality masked at a time and is penalized for remaining confident, which would indicate reliance on unimodal cues rather than cross-modal interactions. We validate SynIB in two regimes. On synthetic XOR tasks where the ground-truth synergy is known by construction, standard training fails to recover it while SynIB does. On five real-world benchmarks, including three MultiBench affective tasks, Hateful Memes with CLIP-ViT and DeBERTa backbones, and a controllable irony extension of CREMA-D we introduce, SynIB improves accuracy on synergy-dependent examples by up to 7.8% and overall accuracy by up to 3.8%.
| Subjects: | Machine Learning (cs.LG); Information Theory (cs.IT) |
| Cite as: | arXiv:2606.09853 [cs.LG] |
| (or arXiv:2606.09853v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2606.09853
arXiv-issued DOI via DataCite
|
Submission history
From: Konstantinos Kontras [view email][v1] Tue, 12 May 2026 19:42:19 UTC (5,210 KB)
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
Current browse context:
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Restless bandits with imperfect binary feedback: PCL-indexability analysis and computation
Jun 11
-
Few-Shot Resampling for Scalable Statistically-Sound Data Mining
Jun 11
-
Physics-informed generative AI for semiconductor manufacturing: Enforcing hard physical constraints in generative models by construction
Jun 11
-
Mechanical Field Networks: Structured Neural Dynamics for Multivariate Systems
Jun 11
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.