Tag

Rag

500 articles archived under #rag · RSS

arXiv — Machine Learning research 19d ago

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information…

14
arXiv — NLP / Computation & Language research 19d ago

The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content

arXiv:2606.11198v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) systems inject external knowledge to improve LLM outputs, yet the format of injected content -- distinct from its semantic relevance -- can independently distort the model's attention…

6
arXiv — NLP / Computation & Language research 19d ago

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

arXiv:2606.11199v1 Announce Type: new Abstract: We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. Rather…

24
arXiv — NLP / Computation & Language research 19d ago

EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA

arXiv:2606.11212v1 Announce Type: new Abstract: Standard Retrieval-Augmented Generation (RAG) pipelines route every query through retrieval and generation unconditionally, incurring unnecessary computation and propagating low-quality context to the generator. We introduce…

12
arXiv — NLP / Computation & Language research 19d ago

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

arXiv:2606.11257v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) pipelines are compute-intensive, combining embedding, retrieval, reranking, and large language model (LLM) generation. Running them entirely on-device benefits privacy, latency, and offline use,…

35
arXiv — NLP / Computation & Language research 19d ago

When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval

arXiv:2606.11350v1 Announce Type: new Abstract: Retrieval-augmented generation degrades when scaled to large, heterogeneous document collections, where dense similarity loses discriminative power, and top-k retrieval increasingly returns semantically similar but contextually…

13
arXiv — NLP / Computation & Language research 19d ago

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

arXiv:2606.11375v1 Announce Type: new Abstract: Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few…

17
arXiv — NLP / Computation & Language research 19d ago

An Ontology-Guided Multi-Anchor Graph Retrieval Framework for Traffic Legal Liability Determination

arXiv:2606.11910v1 Announce Type: new Abstract: Traffic law liability determination is critical for assigning legal penalties, requiring the simultaneous identification of interdependent statutory provisions across multiple legal dimensions. However, existing retrieval-augmented…

28
arXiv — NLP / Computation & Language research 19d ago

uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

arXiv:2606.11945v1 Announce Type: new Abstract: This report describes our participation in SemEval-2026 Task 8 on multi-turn retrieval and question answering. The task evaluates conversational systems across four domains (finance, cloud documentation, government, Wikipedia), and…

28
arXiv — NLP / Computation & Language research 19d ago

Augmenting Molecular Language Models with Local $n$-gram Memory

arXiv:2606.12113v1 Announce Type: new Abstract: Transformer-based language models for SMILES strings suffer from a locality gap: standard character-level tokenization fragments chemically meaningful motifs, forcing models to repeatedly learn local syntax at the expense of…

34
arXiv — NLP / Computation & Language research 19d ago

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

arXiv:2606.12291v1 Announce Type: new Abstract: Large language models (LLMs) now reach expert-level scores on medical licensing exams, encouraging the assumption that high scores imply safe medical judgment while patients increasingly use them for health advice. We show this…

14
TechCrunch — AI news-outlet 19d ago

How memory tools can make AI models worse

New research suggests that AI memory systems can degrade model performance and encourage sycophantic tendencies.

28
NVIDIA Developer Blog official-blog 19d ago

Designing Production-Ready Battery Energy Storage Systems for AI Factories

AI factories are changing what data-center infrastructure must do. Unlike traditional data centers, AI factories are built to manufacture intelligence at scale....

29
llama.cpp releases dev-tools 19d ago

b9589

CUDA: Fix ssm_scan_f32 data-races ( #24360 ) Add missing syncthreads before resuing cub_temp_storage __syncthreads() is required before being allowed to resue TempStorage smem:…

32
arXiv — Machine Learning research 20d ago

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

arXiv:2606.09860v1 Announce Type: new Abstract: Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning…

4
arXiv — Machine Learning research 20d ago

Hyperparameter Learning for Latent Factorization of Tensors for Representation Learning to Large-scale Dynamic Weighted Directed Network

arXiv:2606.09880v1 Announce Type: new Abstract: Large-scale dynamic weighted directed networks (DWDNs) are widely used to model time-varying interactions among nodes. Latent factorization of tensors (LFT) extracts target knowledge from DWDNs via low-rank embedding. However,…

27
arXiv — Machine Learning research 20d ago

Integrating Out, Twice:The Open-System Case That Neural-Network Ensemble Theory Is Missing

arXiv:2606.09950v1 Announce Type: new Abstract: Averaging a neural network over its random parameters and marginalizing a Gaussian sector are the same operation, the Schur complement of the eliminated block, and when that block is closed it returns a covariance and its inverse.…

25
arXiv — Machine Learning research 20d ago

Compositional Generative Modeling from Decentralized Data

arXiv:2606.10153v1 Announce Type: new Abstract: Learning the compositional nature of the physical world requires joint observation of interacting factors. However, because practical data is often decentralized, these factors are fragmented across isolated silos. Existing…

33
arXiv — Machine Learning research 20d ago

DUET -- Dual User Embedding Transformers for Offsite Conversion Prediction

arXiv:2606.10243v1 Announce Type: new Abstract: Offsite conversion rate (OCVR) prediction is an important ranking problem in computational recommendation systems. This task presents a modeling challenge: click signals are abundant and exhibit short temporal horizons, whereas…

25
arXiv — NLP / Computation & Language research 20d ago

MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents

arXiv:2606.10304v1 Announce Type: new Abstract: When LLM agents are coerced into covertly encoding sensitive data (Base64, ROT13, acrostic, synonym chains, and beyond), the resulting outputs evade output-side detection but the underlying computation does not. Across nine…

37
arXiv — NLP / Computation & Language research 20d ago

Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

arXiv:2606.10716v1 Announce Type: new Abstract: Pre-trained language models (PLMs) have achieved strong performance in keyphrase extraction (KPE), largely due to their ability to generate rich contextualized representations. However, long-document KPE remains challenging because…

30
arXiv — NLP / Computation & Language research 20d ago

Attention-Discounted Adaptive Sampler for Masked Diffusion Language Models

arXiv:2606.10829v1 Announce Type: new Abstract: Masked diffusion language models can reduce inference steps by revealing multiple tokens per denoising iteration, but this parallelism is fragile: positions that are individually confident may be unsafe to commit together when…

18
arXiv — NLP / Computation & Language research 20d ago

Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis

arXiv:2606.10381v1 Announce Type: cross Abstract: Muon collider research spans accelerator physics, detector instrumentation, and high-energy phenomenology, with relevant evidence scattered across a rapidly expanding and heterogeneous body of scientific literature. As…

37
arXiv — NLP / Computation & Language research 20d ago

Leveraging Social Media Data for COVID-19 Studies

arXiv:2606.10459v1 Announce Type: cross Abstract: Nowadays, social media networks have become widely preferred sources of information. Especially during the time of the Coronavirus disease 2019 COVID 19 pandemic, social media has been one of the most used platforms to get the…

20
arXiv — NLP / Computation & Language research 20d ago

Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

arXiv:2606.10677v1 Announce Type: cross Abstract: Long-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which…

20
Hugging Face Daily Papers research 20d ago

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

Abstract Latent Memory introduces a compressed representation approach for external memory in question answering, reducing token consumption and storage requirements while maintaining competitive performance across text-only and multimodal benchmarks. Generated by…

28
Hugging Face Daily Papers research 20d ago

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

Abstract Reference-free faithfulness metrics suffer from a blind spot measuring only precision, leading to rewards for abstention; completeness in deterministic domains enables measurement of both precision and recall, revealing that high-precision models often have poor fact…

34
llama.cpp releases dev-tools 20d ago

b9585

graph: Fix granite speech model inference by applying embedding scale when deepstack is not used ( #24357 ) llama-graph : apply embedding scale when deepstack is not used nits: remove non-existant hunyuan-vl from the tests apply suggestion from @gabe-l-hart Co-authored-by: Xuan…

25
Hugging Face Daily Papers research 20d ago

SDR: Set-Distance Rewards for Radiology Report Generation

Abstract Set-based rewards using embedding distances improve chest X-ray report generation by enabling effective post-training and test-time selection without requiring causal reasoning structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning with…

14
r/LocalLLaMA community 20d ago

Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support

Hey everyone, TinySearch v0.2.0 (first stable beta) is out. The first version used DuckDuckGo directly, which worked well enough to prove the idea, but yeah.. relying on one search source was way too fragile lol. DDG started throwing limits/CAPTCHAs more often in the last 2…

25
Hugging Face Daily Papers research 21d ago

Text-to-Image Models Need Less from Text Encoders Than You Think

Abstract Text-to-image models primarily utilize basic text representation aspects like word merging and order rather than complex contextual information encoded in full text embeddings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-to-image models rely on text prompts as…

36
Hugging Face Daily Papers research 21d ago

Answer Presence Drives RAG Rewriting Gains

Abstract Controlled interventions reveal that gold answer presence in rewritten contexts significantly boosts QA performance, with removal causing substantial F1 drops and injection improving results, while conventional probing methods show fragility to sentinel changes.…

35
Hugging Face Daily Papers research 21d ago

Trajectory-Refined Distillation

Abstract On-policy distillation suffers from prefix failure where dense token-level supervision creates fragmented gradients; trajectory-refined distillation addresses this by correcting student rollouts at the trajectory level before distillation. Generated by…

37
arXiv — Machine Learning research 21d ago

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

arXiv:2606.07592v1 Announce Type: new Abstract: Offline reinforcement learning requires careful conservatism to mitigate distribution shift, yet most existing methods apply a fixed penalty uniformly across all states regardless of local data coverage. We present UNIQ…

5
arXiv — Machine Learning research 21d ago

A Topological Characterization of Graph Neural Networks via Stochastic Block Model Embeddings on the n-Sphere

arXiv:2606.07598v1 Announce Type: new Abstract: We propose a topological framework for comparing trained Graph Neural Networks (GNNs) by mapping the Stochastic Block Models (SBMs) induced on the graphon-signal space of a Message Passing Neural Network (MPNN) onto the unit…

15
arXiv — Machine Learning research 21d ago

Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences

arXiv:2606.07629v1 Announce Type: new Abstract: Current approaches to aligning large language models (LLMs) aggregate diverse human preferences into a single reward signal, effectively optimizing for a hypothetical ``average user'' who represents no real person particularly…

10
arXiv — Machine Learning research 21d ago

Temporal Coverage over Density: Parsimonious Training-Set Design for ML Climate Downscaling

arXiv:2606.07898v1 Announce Type: new Abstract: High-resolution regional climate simulations provide critical information for climate impacts assessments but remain computationally expensive, motivating the development of machine-learning downscalers and emulators. A key…

16
arXiv — Machine Learning research 21d ago

Minibatch Selection via Partition Matroid Constrained Gradient Matching

arXiv:2606.07954v1 Announce Type: new Abstract: Training large language models (LLMs) on heterogeneous data requires selecting minibatches that balance convergence speed with coverage across domains. Existing methods either select samples independently within each domain or rely…

5
arXiv — Machine Learning research 21d ago

CausShield: Sample Reconstruction-Resilient Vertical FL via Causal Representation Learning

arXiv:2606.08027v1 Announce Type: new Abstract: Vertical federated learning (VFL) is a distributed learning paradigm that leverages vertically partitioned features across isolated parties without sharing raw samples; however, it remains vulnerable to active sample reconstruction…

38
llama.cpp releases dev-tools 21d ago

b9568

mtp: support for gemma-4 E2B and E4B assistants ( #24282 ) models: update converter to support smaller assistants models: add masked_embd tensors to gemma4-assist arch gemma-4: remove temp debug for conversion gemma-4-mtp: filter out masked_embedding tensors during conversion…

23
r/MachineLearning community 21d ago

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]

I've been building agents for about a year and recently shipped one for a client running ~140 MCP-exposed tools at peak. Along the way I made the canonical mistake. I used cosine similarity over tool description embeddings to pick which tools the model could see per turn. Worked…

9
Hugging Face Daily Papers research 21d ago

GENEB: Why Genomic Models Are Hard to Compare

Abstract GENEB presents a comprehensive benchmark for evaluating genomic foundation models across diverse tasks and architectures under a unified protocol. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress in genomic foundation models is difficult to assess due to fragmented…

25
arXiv — Machine Learning research 22d ago

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

arXiv:2606.06547v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) refine tokens iteratively but commit them irreversibly, leading to a "stability lag" where early decisions remain fragile even after being written. We reveal that Post-Training Quantization…

21
arXiv — Machine Learning research 22d ago

PandaAI: A Practical Agent CQ2 for Neuro-symbolic Data Analysis And Integrated Decision-Making in Quantitative Finance

arXiv:2606.06823v1 Announce Type: new Abstract: While deep learning has excelled in various domains, its application to sequential decision-making in finance remains challenging due to the low Signal-to-Noise Ratio (SNR) and non-stationarity of financial data. Leveraging the…

20
arXiv — Machine Learning research 22d ago

Accelerating Multi-Objective Bayesian Optimisation via Predictive-Gradient Catalysts

arXiv:2606.06984v1 Announce Type: new Abstract: This paper presents a general acceleration mechanism for multi-objective Bayesian optimisation (MOBO) that leverages Gaussian process predictive gradients as auxiliary signals. Rather than replacing existing Pareto-compliant…

7
arXiv — Machine Learning research 22d ago

Closed-Form Spectral Regularization for Multi-Task Model Merging

arXiv:2606.07289v1 Announce Type: new Abstract: Model merging combines several independently fine-tuned experts into a single multi-task model without any training data, reducing the storage, serving, and decentralized-development costs of large foundation models.…

38
arXiv — Machine Learning research 22d ago

Bootstrap Theory of Representational Emergence: Explanatory Insufficiency as a Driver of Representation Learning and World Models

arXiv:2606.07303v1 Announce Type: new Abstract: Representation learning is central to modern machine learning, enabling transitions from handcrafted features to learned embeddings, latent spaces, foundation models, world models, and digital twins. Yet most research examines how…

29
arXiv — Machine Learning research 22d ago

Graph Neural Network leveraging Higher-order Class Label Connectivity for Heterophilous Graphs

arXiv:2606.07475v1 Announce Type: new Abstract: Node classification in graph neural networks (GNNs) has been widely applied in various fields of graph analysis. GNNs achieve high-accuracy node classification in homophilous graphs, where nodes with the same class label tend to be…

32
arXiv — NLP / Computation & Language research 22d ago

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

arXiv:2606.06748v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) reduces but does not eliminate hallucination in large language models. Existing detection methods rely on flat similarity between generated answers and retrieved passages, ignoring structural…

36
arXiv — NLP / Computation & Language research 22d ago

A Four-Condition Diagnostic Protocol for Evidence Utilization in Long-Context and Retrieval-Augmented Language Models

arXiv:2606.06758v1 Announce Type: new Abstract: Final-answer accuracy, retrieval recall, and citation overlap do not by themselves identify whether a long-context or retrieval-augmented language model used the evidence it was given. A model can answer from parametric memory,…

21

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

An Ontology-Guided Multi-Anchor Graph Retrieval Framework for Traffic Legal Liability Determination

uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

Augmenting Molecular Language Models with Local $n$-gram Memory

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

How memory tools can make AI models worse

Designing Production-Ready Battery Energy Storage Systems for AI Factories

b9589

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

Hyperparameter Learning for Latent Factorization of Tensors for Representation Learning to Large-scale Dynamic Weighted Directed Network

Integrating Out, Twice:The Open-System Case That Neural-Network Ensemble Theory Is Missing

Compositional Generative Modeling from Decentralized Data

DUET -- Dual User Embedding Transformers for Offsite Conversion Prediction

MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents

Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

Attention-Discounted Adaptive Sampler for Masked Diffusion Language Models

Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis

Leveraging Social Media Data for COVID-19 Studies

Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

b9585

SDR: Set-Distance Rewards for Radiology Report Generation

Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support

Text-to-Image Models Need Less from Text Encoders Than You Think

Answer Presence Drives RAG Rewriting Gains

Trajectory-Refined Distillation

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

A Topological Characterization of Graph Neural Networks via Stochastic Block Model Embeddings on the n-Sphere

Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences

Temporal Coverage over Density: Parsimonious Training-Set Design for ML Climate Downscaling

Minibatch Selection via Partition Matroid Constrained Gradient Matching

CausShield: Sample Reconstruction-Resilient Vertical FL via Causal Representation Learning

b9568

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]

GENEB: Why Genomic Models Are Hard to Compare

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

PandaAI: A Practical Agent CQ2 for Neuro-symbolic Data Analysis And Integrated Decision-Making in Quantitative Finance

Accelerating Multi-Objective Bayesian Optimisation via Predictive-Gradient Catalysts

Closed-Form Spectral Regularization for Multi-Task Model Merging

Bootstrap Theory of Representational Emergence: Explanatory Insufficiency as a Driver of Representation Learning and World Models

Graph Neural Network leveraging Higher-order Class Label Connectivity for Heterophilous Graphs

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

A Four-Condition Diagnostic Protocol for Evidence Utilization in Long-Context and Retrieval-Augmented Language Models