Tag

Rag

61 articles archived under #rag · RSS

r/MachineLearning community 1h ago

Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]

We've been building Scenema Audio as part of our video production platform at scenema.ai, and we're releasing the model weights and inference code. The core idea: emotional performance and voice identity are independent. You describe how the speech should be performed (rage,…

37
Hugging Face Daily Papers research 8h ago

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Abstract ORBIT addresses catastrophic forgetting in large language model fine-tuning for generative retrieval by tracking parameter distances and employing weight averaging to maintain model performance. AI-generated summary Despite the rapid advancements in large language model…

7
r/MachineLearning community 11h ago

Training a number-aware embedding model + Text JEPA doesn't work too well + Text auto-encoders have a strange frequency bias [R][P]

Hi guys! I've spent 1y trying to predict company growth from the full text of their 10-k filings. It completely failed. But I've had a lot of fun playing with encoder transformers and making them good at numbers (bypassing the tokenizer/prediction head for numbers). I've…

22
Hugging Face Daily Papers research 15h ago

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

Abstract FaithfulFaces is a pose-faithful facial identity preservation framework that improves identity consistency in text-to-video generation through pose-shared alignment and explicit Euler angle embeddings. AI-generated summary Identity-preserving text-to-video generation…

38
Hugging Face Daily Papers research 18h ago

L2P: Unlocking Latent Potential for Pixel Generation

Abstract Latent-to-Pixel transfer paradigm efficiently leverages pre-trained latent diffusion models to create pixel-space models with minimal training overhead and high-resolution generation capabilities. AI-generated summary Pixel diffusion models have recently regained…

14
Hugging Face Daily Papers research 18h ago

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Abstract PASA is a robust watermarking algorithm for large language models that operates at the semantic level using latent embedding spaces and shared randomness for secure text detection. AI-generated summary Watermarking for large language models (LLMs) is a promising…

16
arXiv — Machine Learning research 19h ago

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

arXiv:2605.10947v1 Announce Type: new Abstract: EEG microstate analysis segments continuous brain electrical activity into brief, quasi-stable topographic configurations that reflect discrete functional brain states. Conventional approaches such as Modified K-Means operate…

22
arXiv — Machine Learning research 19h ago

When and How to Canonize: A Generalization Perspective

arXiv:2605.11008v1 Announce Type: new Abstract: While invariant architectures are standard for processing symmetric data, there is growing interest in achieving invariance by applying group averaging or canonization to non-invariant backbones. However, the theoretical…

12
arXiv — Machine Learning research 19h ago

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

arXiv:2605.11142v1 Announce Type: new Abstract: Graph representation learning has become a standard approach for analyzing networked data, with latent embeddings widely used for link prediction, community detection, and related tasks. Yet a basic design choice, the latent…

36
arXiv — Machine Learning research 19h ago

CORE: Cyclic Orthotope Relation Embedding for Knowledge Graph Completion

arXiv:2605.11159v1 Announce Type: new Abstract: Knowledge graph completion (KGC) aims to automatically infer missing facts in multi-relational data by mapping entities and relations into continuous representation spaces. Recent region-based embedding models have shown great…

16
arXiv — Machine Learning research 19h ago

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

arXiv:2605.11170v1 Announce Type: new Abstract: Noise-based certified machine unlearning currently faces a hard ceiling: the noise magnitude required to certify unlearning typically destroys model utility, particularly for large-scale deletion requests. While leveraging public…

12
arXiv — Machine Learning research 19h ago

Optimistic Dual Averaging Unifies Modern Optimizers

arXiv:2605.11172v1 Announce Type: new Abstract: We introduce SODA, a generalization of Optimistic Dual Averaging, which provides a common perspective on state-of-the-art optimizers like Muon, Lion, AdEMAMix and NAdam, showing that they can all be viewed as optimistic instances…

31
arXiv — Machine Learning research 19h ago

The Scaling Law of Evaluation Failure: Why Simple Averaging Collapses Under Data Sparsity and Item Difficulty Gaps, and How Item Response Theory Recovers Ground Truth Across Domains

arXiv:2605.11205v1 Announce Type: new Abstract: Benchmark evaluation across AI and safety-critical domains overwhelmingly relies on simple averaging. We demonstrate that this practice produces substantially misleading rankings when two conditions co-occur: (1) the evaluation…

34
arXiv — Machine Learning research 19h ago

Leveraging RAG for Training-Free Alignment of LLMs

arXiv:2605.11217v1 Announce Type: new Abstract: Large language model (LLM) alignment algorithms typically consist of post-training over preference pairs. While such algorithms are widely used to enable safety guardrails and align LLMs with general human preferences, we show that…

36
arXiv — Machine Learning research 19h ago

ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models

arXiv:2605.11222v1 Announce Type: new Abstract: Quantization is an effective strategy to reduce the storage and computation footprint of large language models (LLMs). Post-training quantization (PTQ) is a leading approach for compressing LLMs. Popular weight quantization…

5
arXiv — Machine Learning research 19h ago

Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning

arXiv:2605.11289v1 Announce Type: new Abstract: Average-reward reinforcement learning requires estimating the gain and the bias, which is defined only up to an additive constant. This makes direct distributional analogues ill-posed on the real line. We introduce a quotient-space…

27
arXiv — Machine Learning research 19h ago

FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling

arXiv:2605.11428v1 Announce Type: new Abstract: Exploratory analysis of high-dimensional data rarely stops at a single embedding. In practice, analysts rerun dimensionality reduction after changing preprocessing, subsets, or hyperparameters, and standard nonlinear methods can…

26
arXiv — NLP / Computation & Language research 19h ago

Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking

arXiv:2605.12028v1 Announce Type: new Abstract: We describe our system for SemEval-2026 Task 8 (MTRAGEval), participating in Task A (Retrieval) across four English-language domains. Our approach employs a three-stage pipeline: (1) query rewriting via a LoRA-fine-tuned Qwen 2.5…

30
arXiv — NLP / Computation & Language research 19h ago

Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

arXiv:2605.12177v1 Announce Type: new Abstract: [Abridged] Production LLM deployments receive feedback from a non-random fraction of users: thumbs sit mostly in the tails of the satisfaction distribution, and a naive average over them can land 40-50 percentage points away from…

6
arXiv — NLP / Computation & Language research 19h ago

Geometric Factual Recall in Transformers

arXiv:2605.12426v1 Announce Type: new Abstract: How do transformer language models memorize factual associations? A common view casts internal weight matrices as associative memories over pairs of embeddings, requiring parameter counts that scale linearly with the number of…

5
arXiv — NLP / Computation & Language research 19h ago

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

arXiv:2605.12487v1 Announce Type: new Abstract: We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of…

36
arXiv — NLP / Computation & Language research 19h ago

On Problems of Implicit Context Compression for Software Engineering Agents

arXiv:2605.11051v1 Announce Type: cross Abstract: LLM-based Software Engineering agents face a critical bottleneck: context length limitations cause failures on complex, long-horizon tasks. One promising solution is to encode context as continuous embeddings rather than discrete…

27
arXiv — NLP / Computation & Language research 19h ago

Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models

arXiv:2605.11374v1 Announce Type: cross Abstract: Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Most modern embedding checkpoints are distilled from large LLM backbones and inherit their representation…

21
Hugging Face Daily Papers research 19h ago

Geometric Factual Recall in Transformers

Abstract Transformer language models use geometric memorization where embeddings encode linear superpositions of attributes and MLPs act as relation-conditioned selectors rather than associative key-value mappings. AI-generated summary How do transformer language models memorize…

6
r/LocalLLaMA community 20h ago

I've seen a lot of folks ask "can local LLMs actually do anything useful?"

And I'm here to share my experience. The answer is resoundingly 'yes'. Let me start with the local model I use every day in my AI harness: embedding models. I'm using an embedding model to give my AI's persistent memory system a semantic search protocol that makes its memory…

37
Vercel — AI dev-tools 3d ago

How Superset built the IDE for AI agents on Vercel

Superset on Vercel 1,000–1,400 deployments per week ~600 preview deployments per day ~30 second average build time 57–64% week-over-week DAU growth Software development with AI started as a single engineer chatting with a single agent about a local repo. Today, developers direct…

5
Ars Technica — AI news-outlet 5d ago

Chrome's 4GB AI model isn't new, but you're not wrong for being confused

You can stop Chrome from taking up 4GB of storage for local AI, but that shouldn't be your problem.

33
Simon Willison community 7d ago

Vibe coding and agentic engineering are getting closer than I'd like

I recently talked with Joseph Ruscio about AI coding tools for Heavybit's High Leverage podcast: Ep. #9, The AI Coding Paradigm Shift with Simon Willison . Here are some of my highlights, including my disturbing realization that vibe coding and agentic engineering have started…

25
LangChain releases dev-tools 8d ago

langchain-fireworks==1.3.1

Changes since langchain-fireworks==1.3.0 fix(fireworks): require api_key in FireworksEmbeddings ( #37193 ) release(fireworks): 1.3.1 ( #37189 ) fix(fireworks): strip non-wire keys from ToolMessage text content blocks ( #37187 )

33
Vercel — AI dev-tools 8d ago

Query observability metrics using the Vercel CLI

You can now access Observability Plus metrics in the Vercel CLI. Query observability data for any Vercel team or project using the new vercel metrics command. Coding agents can also leverage this new command to better analyze the performance, reliability, or security of…

31
Don't Worry About the Vase community 23d ago

Opus 4.7 Part 1: The Model Card

Less than a week after completing coverage of Claude Mythos, here we are again as Anthropic gives us Claude Opus 4.7.

28
Smol AI News news-outlet 28d ago

not much happened today

**OpenAI** expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes,…

37
Stack Overflow Blog news 1mo ago

The messy truth of your AI strategies

Ryan welcomes Hema Raghavan, co-founder and head of engineering at Kumo.ai, to dive into all the messy stuff that comes with implementing AI, from pipeline sprawl to shadow AI.

25
NVIDIA Developer Blog official-blog 1mo ago

Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP

Training LLMs requires periodic checkpoints. These full snapshots of model weights, optimizer states, and gradients are saved to storage so training can resume...

38
OpenAI news 1mo ago

CyberAgent moves faster with ChatGPT Enterprise and Codex

CyberAgent uses ChatGPT Enterprise and Codex to securely scale AI adoption, improve quality, and accelerate decisions across advertising, media, and gaming.

10
Vercel — AI dev-tools 1mo ago

Zero Data Retention on AI Gateway

Building with multiple AI models means wrestling with fragmented data policies. With many different model providers, it's not just fragmented, it's just too much time spent on the wrong things. You have to read through different terms of service, track which providers comply…

13
MIT News — AI research 1mo ago

Helping data centers deliver higher performance with less hardware

Researchers developed a system that intelligently balances workloads to improve the efficiency of flash storage hardware in a data center.

33
MIT News — AI research 1mo ago

MIT researchers use AI to uncover atomic defects in materials

A new model measures defects that can be leveraged to improve materials’ mechanical strength, heat transfer, and energy-conversion efficiency.

16
OpenAI Python SDK releases dev-tools 1mo ago

v2.30.0

2.30.0 (2026-03-25) Full Changelog: v2.29.0...v2.30.0 Features api: add keys field to Click/DoubleClick/Drag/Move/Scroll computer actions ( ee1bbed ) Bug Fixes api: align SDK response types with expanded item schemas ( f3f258a ) sanitize endpoint path params ( 89f6698 ) types:…

11
NVIDIA Developer Blog official-blog 1mo ago

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety

Agentic AI is an ecosystem where specialized models work together to handle planning, reasoning, retrieval, and safety guardrailing. As these systems scale,...

37
Hugging Face official-blog 1mo ago

Build a Domain-Specific Embedding Model in Under a Day

Back to Articles Build a Domain-Specific Embedding Model in Under a Day Enterprise + Article Published March 20, 2026 Upvote 73 Steve Han steve-nvidia nvidia Rucha Apte ruchaa01 nvidia Sean Sodha ssodha-nv nvidia Oliver Holworthy nvidia-oliver-holworthy nvidia If you are…

9
Vercel — AI dev-tools 1mo ago

Build knowledge agents without embeddings

Most knowledge agents start the same way. You pick a vector database, then build a chunking pipeline. You choose an embedding model, then tune retrieval parameters. Weeks later, your agent answers a question incorrectly, and you have no idea which chunk it retrieved or why that…

38
Vercel — AI dev-tools 1mo ago

360 billion tokens, 3 million customers, 6 engineers

Impact at a glance Durable ships new production agents to customers in a single day AI features and agents serve ~1.1B tokens per day (360B per year) 10x leverage for every engineer, product manager, and designer 3-4x lower infra cost compared to self hosting Durable began with…

5
NVIDIA Developer Blog official-blog 1mo ago

Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI

AI‑native organizations increasingly face scaling challenges as agentic AI workflows drive context windows to millions of tokens and models scale toward...

27
NVIDIA Developer Blog official-blog 1mo ago

Design, Simulate, and Scale AI Factory Infrastructure with NVIDIA DSX Air

Building AI factories is complex and requires efficient integration across compute, networking, security, and storage systems. To achieve rapid Time to AI and...

10
Hugging Face official-blog 2mo ago

Introducing Storage Buckets on the Hugging Face Hub

Back to Articles Introducing Storage Buckets on the Hugging Face Hub Published March 10, 2026 Update on GitHub Upvote 194 Lucain Pouget Wauplin Eliott Coyac coyotte508 Adrien Carreira XciD Victor Mustar victor Julien Chaumond julien-c Quentin Lhoest lhoestq Pierric Cistac…

16
NVIDIA Developer Blog official-blog 2mo ago

Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM

Organizations deploying LLMs are challenged by inference workloads with different resource requirements. A small embedding model might use only a few gigabytes...

27
MIT News — AI research 2mo ago

New method could increase LLM training efficiency

By leveraging idle computing time, researchers can double the speed of model training while preserving accuracy.

13
NVIDIA Developer Blog official-blog 2mo ago

Build AI-Ready Knowledge Systems Using 5 Essential Multimodal RAG Capabilities

Enterprise data is inherently complex: real-world documents are multimodal, spanning text, tables, charts and graphs, images, diagrams, scanned pages, forms,...

6
Maarten Grootendorst research 3mo ago

The Story Behind the "RAG Pack"

It all started with a course...

18
Smol AI News news-outlet 3mo ago

not much happened today

**AI News for 1/16/2026-1/19/2026** covers new architectures for scaling Transformer memory and context, including **STEM** from **Carnegie Mellon** and **Meta AI**, which replaces part of the FFN with a token-indexed embedding lookup enabling CPU offload and asynchronous…

37
Google DeepMind official-blog 6mo ago

How AI is giving Northern Ireland teachers time back

A six-month long pilot program with the Northern Ireland Education Authority’s C2k initiative found that integrating Gemini and other generative AI tools saved participating teachers an average of 10 hours per week.

38
Google DeepMind official-blog 6mo ago

Aeneas transforms how historians connect the past

Introducing the first model for contextualizing ancient inscriptions, designed to help historians better interpret, attribute and restore fragmentary texts.

10
Google DeepMind official-blog 6mo ago

Discovering new solutions to century-old problems in fluid dynamics

Our new method could help mathematicians leverage AI techniques to tackle long-standing challenges in mathematics, physics and engineering.

35
Eugene Yan research 31mo ago

AI Engineer 2023 Keynote - Building Blocks for LLM Systems

Evals, retrieval-augmented generation, guardrails, and collecting feedback; all that good stuff.

37
Eugene Yan research 33mo ago

Patterns for Building LLM-based Systems & Products

Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.

22
Eugene Yan research 35mo ago

Obsidian-Copilot: An Assistant for Writing & Reflecting

Writing drafts via retrieval-augmented generation. Also reflecting on the week's journal entries.

15
Eugene Yan research 59mo ago

Patterns for Personalization in Recommendations and Search

A whirlwind tour of bandits, embedding+MLP, sequences, graph, and user embeddings.

33
Lil'Log (Lilian Weng) research 60mo ago

Contrastive Representation Learning

The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart. Contrastive learning can be applied to both supervised and unsupervised settings. When working with…

37
Eugene Yan research 61mo ago

Search: Query Matching via Lexical, Graph, and Embedding Methods

An overview and comparison of the various approaches, with examples from industry search systems.

12
Lil'Log (Lilian Weng) research 104mo ago

Learning Word Embedding

Human vocabulary comes in free text. In order to make a machine learning model understand and process the natural language, we need to transform the free-text words into numeric values. One of the simplest transformation approaches is to do a one-hot encoding in which each…

20

Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Training a number-aware embedding model + Text JEPA doesn't work too well + Text auto-encoders have a strange frequency bias [R][P]

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

L2P: Unlocking Latent Potential for Pixel Generation

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

When and How to Canonize: A Generalization Perspective

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

CORE: Cyclic Orthotope Relation Embedding for Knowledge Graph Completion

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

Optimistic Dual Averaging Unifies Modern Optimizers

The Scaling Law of Evaluation Failure: Why Simple Averaging Collapses Under Data Sparsity and Item Difficulty Gaps, and How Item Response Theory Recovers Ground Truth Across Domains

Leveraging RAG for Training-Free Alignment of LLMs

ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models

Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning

FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling

Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking

Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

Geometric Factual Recall in Transformers

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

On Problems of Implicit Context Compression for Software Engineering Agents

Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models

Geometric Factual Recall in Transformers

I've seen a lot of folks ask "can local LLMs actually do anything useful?"

How Superset built the IDE for AI agents on Vercel

Chrome&#039;s 4GB AI model isn&#039;t new, but you&#039;re not wrong for being confused

Vibe coding and agentic engineering are getting closer than I'd like

langchain-fireworks==1.3.1

Query observability metrics using the Vercel CLI

Opus 4.7 Part 1: The Model Card

not much happened today

The messy truth of your AI strategies

Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP

CyberAgent moves faster with ChatGPT Enterprise and Codex

Zero Data Retention on AI Gateway

Helping data centers deliver higher performance with less hardware

MIT researchers use AI to uncover atomic defects in materials

v2.30.0

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety

Build a Domain-Specific Embedding Model in Under a Day

Build knowledge agents without embeddings

360 billion tokens, 3 million customers, 6 engineers

Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI

Design, Simulate, and Scale AI Factory Infrastructure with NVIDIA DSX Air

Introducing Storage Buckets on the Hugging Face Hub

Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM

New method could increase LLM training efficiency

Build AI-Ready Knowledge Systems Using 5 Essential Multimodal RAG Capabilities

The Story Behind the "RAG Pack"

not much happened today

How AI is giving Northern Ireland teachers time back

Aeneas transforms how historians connect the past

Discovering new solutions to century-old problems in fluid dynamics

AI Engineer 2023 Keynote - Building Blocks for LLM Systems

Patterns for Building LLM-based Systems & Products

Obsidian-Copilot: An Assistant for Writing & Reflecting

Patterns for Personalization in Recommendations and Search

Contrastive Representation Learning

Search: Query Matching via Lexical, Graph, and Embedding Methods

Learning Word Embedding

Chrome's 4GB AI model isn't new, but you're not wrong for being confused