SemBridge: Language Transfer in Sparse Encoders via Multilingual Semantic Bridges
Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.
SemBridge: Language Transfer in Sparse Encoders via Multilingual Semantic Bridges
Abstract
SemBridge enhances cross-lingual sparse encoder adaptation by using multilingual bridge models to establish semantic alignments and improve retrieval performance across multiple languages.
Sparse encoders offer high-precision retrieval by representing term importance within a vocabulary space, yet their English-centric structures pose a critical impediment to language transfer for non-English languages. To overcome this structural limitation, we propose SemBridge, a novel embedding initialization method designed for cross-lingual adaptation in sparse encoders by leveraging multilingual bridge models. SemBridge establishes semantic alignments between source and target vocabularies using multilingual dense embeddings as a bridge. Rather than directly relying on all source tokens, SemBridge selects a small set of semantically related source-language tokens and uses them to initialize each target-language token, effectively filtering out semantic noise and reconstructing target tokens as precise linear combinations of core synonyms. This accelerates convergence during fine-tuning and improves training efficiency. Extensive experiments across five languages and four sparse architectures demonstrate that SemBridge achieves superior zero-shot retrieval performance and consistently improves retrieval performance after fine-tuning compared to existing baselines. These results validate SemBridge as a practical solution for deploying high-performance sparse retrieval systems in diverse linguistic environments.
Get this paper in your agent:
hf papers read 2605.26002 curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
More from Hugging Face Daily Papers
-
SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills
May 26
-
InstructSAM: Segment Any Instance with Any Instructions
May 26
-
Pixel-Level Pavement Distress Assessment Using Instance Segmentation
May 26
-
Reinforcing Few-step Generators via Reward-Tilted Distribution Matching
May 26
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.