r/LocalLLaMA · June 18, 2026 · 1 min read

LFM2.5-Embedding-350M & LFM2.5-ColBERT-350M

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

LFM2.5-Embedding-350M & LFM2.5-ColBERT-350M

LFM2.5-Embedding-350M is a dense bi-encoder for fast multilingual retrieval. It produces a single vector per document — the smallest, fastest index — for reliable cross-lingual search across 11 languages.

Best-in-class multilingual accuracy for a dense embedder of its size.
Inference speed is on par with much smaller models, thanks to the efficient LFM2 backbone.
You can use it as a drop-in replacement in your current RAG pipelines.

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M-GGUF

LFM2.5-ColBERT-350M is a late interaction retriever with best-in-class multilingual performance. It stores one vector per token and matches queries to documents with MaxSim, so you can store documents in one language (for example, a product description in English) and retrieve them in many languages with high accuracy.

LFM2.5-ColBERT-350M offers best-in-class accuracy across 11 languages.
Inference speed is on par with much smaller models, thanks to the efficient LFM2 backbone.
You can use it as a drop-in replacement in your current RAG pipelines to improve performance.

https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M-GGUF

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA