Tag

Rag

500 articles archived under #rag · RSS

arXiv — NLP / Computation & Language research 28d ago

Graph-Augmented Retrieval for Cross-Entity Financial Sentiment Analysis: A Comparative Study

arXiv:2606.00062v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has become foundational for grounding large language models in domain-specific corpora, yet conventional vector-based RAG systems are fundamentally limited in their ability to capture the…

23
arXiv — NLP / Computation & Language research 28d ago

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

arXiv:2606.00091v1 Announce Type: new Abstract: Joint Embedding Predictive Architectures (JEPAs) have reshaped self-supervised representation learning in vision. The recent LLM-JEPA ported JEPA to autoregressive language models but inherited two steep costs from the…

38
arXiv — NLP / Computation & Language research 28d ago

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

arXiv:2606.00683v1 Announce Type: new Abstract: Recent progress in the development of language models has been defined by scale, with each generation absorbing more of the world's knowledge into its weights. However, many practical applications benefit more from robust reasoning…

25
arXiv — NLP / Computation & Language research 28d ago

Chunking Methods on Retrieval-Augmented Generation - Effectiveness Evaluation Against Computational Cost and Limitations

arXiv:2606.00881v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has demonstrated significant capabilities in enhancing the performance of Large Language Models (LLMs). One of the key tasks in RAG systems is the chunking process. Traditionally, fixed-size…

38
arXiv — NLP / Computation & Language research 28d ago

ExpWeaver: LLM Agents Learn from Experience via Latent RAG

arXiv:2606.01041v1 Announce Type: new Abstract: Experience learning has achieved promising results in enhancing LLM agent planning and reasoning by integrating past interactions as reusable knowledge. However, existing methods remain confined to explicit text space, retrieving…

28
arXiv — NLP / Computation & Language research 28d ago

When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression

arXiv:2606.01074v1 Announce Type: new Abstract: Recent high-performing text embedding models often output high-dimensional real-valued vectors, resulting in substantial storage and computational costs. To address this issue, compression methods based on dimensionality reduction…

18
arXiv — NLP / Computation & Language research 28d ago

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

arXiv:2606.01212v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) systems are widely deployed and increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely…

23
arXiv — NLP / Computation & Language research 28d ago

Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue

arXiv:2606.01223v1 Announce Type: new Abstract: Despite substantial progress in long-context modeling, existing benchmarks remain confined to factual memory for explicit recall, failing to measure the reflective memory required to synthesize fragmented, multimodal cues into…

10
arXiv — NLP / Computation & Language research 28d ago

Efficient RAG with Intent-Aware Retrieval and Semantics-Preserving Chunking

arXiv:2606.01240v1 Announce Type: new Abstract: The demand for powerful instruction following and reasoning capability of large language models (LLMs) has promoted rapid development of retrieval-augmented generation (RAG). The RAG system assists LLM generation by retrieving…

36
Hugging Face Daily Papers research 28d ago

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism

Abstract Speculative Pipeline Decoding introduces a novel framework that leverages pipeline parallelism to accelerate large language model inference by enabling parallel token processing and reducing decoding latency. AI-generated summary Speculative Decoding (SD) accelerates…

17
Ollama releases dev-tools 28d ago

v0.30.0-rc32: llama-server followups (#16353)

llama-server followups Misc fixes for #16031 Add back dropped ROCm build flag for multi-GPU support on windows Fix amdhip64_*.dll version detection for "latest" selection Fix embeddings API for consistent normalize behavior with prior versions ci: set up for automated llama.cpp…

19
r/MachineLearning community 28d ago

[D] Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the…

32
Hugging Face Daily Papers research 28d ago

A Topology-Aware Spatiotemporal Handover Framework for Continuous Multi-UAV Tracking

Abstract A real-time multi-camera multi-vehicle tracking system addresses trajectory fragmentation in UAV-based traffic monitoring through a topology-based spatiotemporal handover mechanism and deterministic queue-based matching algorithm. AI-generated summary The integration of…

21
Hugging Face Daily Papers research 28d ago

One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

Abstract Group Prompting enables efficient cell instance segmentation by leveraging per-type prompting through a training-free framework that uses multi-scale encoder features and recursive prompt expansion. AI-generated summary Cell instance segmentation models trained on…

32
Hugging Face Daily Papers research 28d ago

How can embedding models bind concepts?

Abstract Vision-language models like CLIP struggle with concept binding despite recognizing individual concepts, but controlled transformer models can learn low-complexity binding functions that generalize better through multiplicative interactions. AI-generated summary Humans…

11
arXiv — Machine Learning research 29d ago

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

arXiv:2605.30590v1 Announce Type: new Abstract: Two clinical AI systems can score nearly identically on coverage-based rubrics yet behave radically differently when their patient inputs change: one updates its recommendations to match the new clinical signal, while the other…

23
arXiv — Machine Learning research 29d ago

ScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings

arXiv:2605.30597v1 Announce Type: new Abstract: Nonlinear dimensionality-reduction methods such as UMAP and PaCMAP adaptively normalize local distances during graph construction, erasing neighborhood scale from the data. This distorts more than relative cluster sizes: sparse…

9
arXiv — Machine Learning research 29d ago

TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness

arXiv:2605.30601v1 Announce Type: new Abstract: Modern deep networks remain fragile under distribution shift and adversarial perturbations, often due to excessive or poorly structured input sensitivity. We introduce TASER (Task-Aware Stein Regularisation), a training-time…

20
arXiv — Machine Learning research 29d ago

SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching

arXiv:2605.30729v1 Announce Type: new Abstract: Schema matching is a fundamental step in integrating heterogeneous data sources. While Pre-trained Language Models (PLMs) have revolutionized this task by capturing linguistic semantics, they typically process tabular data as…

35
arXiv — Machine Learning research 29d ago

Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning

arXiv:2605.30776v1 Announce Type: new Abstract: Offline-to-Online Reinforcement Learning (O2O-RL) leverages an offline, pre-trained policy to minimize costly online interactions. Although data-efficient, O2O-RL is susceptible to shifts between offline and online distributions.…

8
arXiv — Machine Learning research 29d ago

Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences

arXiv:2605.30873v1 Announce Type: new Abstract: Federated Learning (FL) offers a privacy-preserving pathway for aligning Large Language Models (LLMs); however, existing frameworks typically enforce a monolithic reward model, inevitably averaging out inherently conflicting user…

35
arXiv — NLP / Computation & Language research 29d ago

Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

arXiv:2605.30400v1 Announce Type: new Abstract: We present a protocol to evaluate ChatGPT's ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify…

26
arXiv — NLP / Computation & Language research 29d ago

CanLegalRAGBench: Evaluating Retrieval-Augmented Generation on Canadian Case Law

arXiv:2605.30497v1 Announce Type: new Abstract: RAG-based legal assistants have been growing in popularity, but LLM hallucinations remain a key issue and potentially undermines justice. While benchmarks have been developed to evaluate progress, many rely on synthetic queries…

37
arXiv — NLP / Computation & Language research 29d ago

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

arXiv:2605.30501v1 Announce Type: new Abstract: Watermarking embeds statistical signatures in AI-generated text for detection and attribution. We reveal a fundamental vulnerability: when users access multiple models (today's reality), watermarks trivially fail. Watermarks…

21
arXiv — NLP / Computation & Language research 29d ago

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

arXiv:2605.30529v1 Announce Type: new Abstract: Sentence-embedding models for semantic search are overwhelmingly developed and evaluated on English corpora. When applied to clinical retrieval in other languages -- particularly retrieval of ICD-10-CM / CIE-10 codes -- recall…

26
arXiv — NLP / Computation & Language research 29d ago

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

arXiv:2605.30711v1 Announce Type: new Abstract: Agentic LLMs must continuously decide whether newly extracted facts should be added, merged with existing memories, or ignored, yet prior work has focused more on retrieval and storage than on principled write-side control. We…

38
arXiv — NLP / Computation & Language research 29d ago

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

arXiv:2605.31010v1 Announce Type: new Abstract: Retrieval-augmented generation is intensively studied to ground large language models on external evidence. However, retrieving from a unified knowledge base could inevitably introduce irrelevant information that may mislead…

23
arXiv — NLP / Computation & Language research 29d ago

On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets

arXiv:2605.31142v1 Announce Type: new Abstract: Large-scale multilingual text embedding models play crucial role in both research and industry, yet their behavior in language-specific, multi-task settings remains insufficiently understood. Although benchmarking platforms such as…

30
arXiv — NLP / Computation & Language research 29d ago

Learning Whom to Trust: Market-Feedback Adaptive Retrieval for Frozen LLMs in Event-Driven Financial RAG

arXiv:2605.31201v1 Announce Type: new Abstract: Financial retrieval-augmented generation (RAG) systems typically rank evidence by textual relevance, but in financial markets the useful evidence source depends on event type, forecast horizon, and market context. We study…

20
Hugging Face Daily Papers research 29d ago

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Abstract Multi-step trojan attacks in local LLM agents can bypass existing defenses by embedding malicious prompts across multiple operations, requiring new detection methods like DASGuard for effective protection. AI-generated summary LLM agents are evolving from conversational…

20
The Information — AI news-outlet 29d ago

Why Forward Deployed Engineers Are the Rage

AI researchers may have the hottest job in tech, but forward-deployed engineers who put the AI to good use are becoming indispensable too. The military-inspired job title, which Palantir began using in the context of business software more than a decade ago, has spread to all…

29
llama.cpp releases dev-tools 1mo ago

b9442

vocab : add tokenizer support for jina-embeddings-v2-base-zh ( #18756 ) vocab : add jina-embeddings-v2-base-zh (whitespace tokenizer) lowercase defaults to true type fix Co-authored-by: Sigbjørn Skjæret [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS…

12
r/LocalLLaMA community 1mo ago

Why does Thinking Output More Tokens Than a Response?

I was too lazy to use a vector DB + Embedding + Clustering for this list of 1000 items I wanted to categorize. I was hoping to use a local LLM to do it, but it would only respond with a list of about 100 items or so and their categories. It confused me because when I saw the…

22
r/MachineLearning community 1mo ago

Why do the output layer weights become word vectors in Word2Vec? [D]

I'm trying to understand the intuition behind Word2Vec training using a neural network. In Word2Vec (CBOW or Skip-gram), we often hear that the weight matrices learned during training contain the vector representations (embeddings) of words. However, I don't understand why the…

31
Hugging Face Daily Papers research 1mo ago

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

Abstract CONF-KV is a KV-cache management system that dynamically adjusts cache retention based on model uncertainty, improving memory efficiency and performance for long-sequence language model inference. AI-generated summary Long-horizon LLM inference turns the key--value (KV)…

12
The Information — AI news-outlet 1mo ago

Kalshi, Coinbase Approved to Offer Crypto Perpetuals in U.S.

Prediction market Kalshi won approval from U.S. regulators to offer crypto trading through bitcoin perpetuals, a type of highly-leveraged derivatives product, confirming an April report by The Information. Coinbase also won a greenlight from the U.S. Commodity Futures Trading…

14
Hugging Face Daily Papers research 1mo ago

Xetrieval: Mechanistically Explaining Dense Retrieval

Abstract Xetrieval is a mechanistic framework that explains dense retrieval by enhancing sentence embeddings with reasoning information and decomposing them into interpretable sparse features for retrieval decision explanations. AI-generated summary Explaining why dense…

32
llama.cpp releases dev-tools 1mo ago

b9406

llama: add llm_graph_input_mtp ( #23643 ) llama: add llm_graph_input_mtp rename input_mtp -> input_token_embd add TODO about mtmd embedding cont : clean-up Co-authored-by: Georgi Gerganov [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

38
MIT Technology Review — AI news-outlet 1mo ago

How the Pope’s Magnifica Humanitas offers a template for individuals to meet the AI moment

Pope Leo XIV’s new encyclical on artificial intelligence includes a statement that warrants serious attention from technologists and policymakers: “Technology is never neutral.” Magnifica Humanitas (“Magnificent Humanity”) is a clarion call to all people to act with courage and…

8
Hugging Face Daily Papers research 1mo ago

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers

Abstract PRISM evaluates automated peer review systems across multiple dimensions using argument mining and retrieval-augmented verification, revealing that while LLMs match human performance in specific areas, no system consistently equals human reviewers across all evaluation…

19
Hugging Face Daily Papers research 1mo ago

RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains

Abstract RUBRIC-ARROW presents an alternating framework for reward modeling that improves upon rubric-based methods by reducing ties and leveraging pairwise preference data for training. AI-generated summary Pointwise reward modeling offers critical signals for LLM…

4
arXiv — Machine Learning research 1mo ago

TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models

arXiv:2605.28868v1 Announce Type: new Abstract: Metagenomic taxonomic annotation aims to identify the microbial origins of DNA fragments in environmental samples. Traditional methods that rely on sequence similarity are often constrained by the high microbial diversity and the…

11
arXiv — Machine Learning research 1mo ago

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

arXiv:2605.28900v1 Announce Type: new Abstract: We introduce Spectral Guidance, a framework for controlling diffusion models by leveraging the intrinsic geometry of the generative process. As data is progressively corrupted by noise, only a small number of features remain…

38
arXiv — Machine Learning research 1mo ago

Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems

arXiv:2605.28912v1 Announce Type: new Abstract: The rapid growth of AI-driven data centers and large-scale energy storage systems is increasing the reliance of power system operation on real-time measurement data and automated decision-making. However, many existing detection…

28
arXiv — Machine Learning research 1mo ago

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

arXiv:2605.29002v1 Announce Type: new Abstract: Federated reinforcement learning enables decentralized agents to collaboratively improve policies or value estimates without exchanging raw trajectories. However, FedAvg-style parameter averaging is not function-space consistent:…

24
arXiv — Machine Learning research 1mo ago

Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts

arXiv:2605.29283v1 Announce Type: new Abstract: Recent physics foundation models claim general spatiotemporal forecasting ability, yet their evaluations often collapse performance into a single average score under a fixed training distribution. This makes it difficult to…

22
arXiv — Machine Learning research 1mo ago

K-FinHallu: A Hallucination Detection Benchmark for Multi-Turn RAG in Korean Finance

arXiv:2605.29523v1 Announce Type: new Abstract: Large Language Models (LLMs) have advanced financial automation through Retrieval-Augmented Generation (RAG), yet hallucinations remain a critical barrier to deployment in high-stakes environments. Existing benchmarks focus on…

38
arXiv — NLP / Computation & Language research 1mo ago

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

arXiv:2605.28823v1 Announce Type: new Abstract: As the influence of LLMs expands, it is imperative to gain insight into their decisions. One way to do that is to develop probes that detect the presence or absence of a broad set of concepts within the embeddings computed in an…

33
arXiv — NLP / Computation & Language research 1mo ago

A comparative study of transformer-based embeddings for topic coherence

arXiv:2605.28832v1 Announce Type: new Abstract: Topic modeling is a branch of Natural Language Processing (NLP) that aims to organize large collections of texts into coherent groups according to word co-occurrence patterns, with Latent Dirichlet Allocation (LDA) remaining one of…

32
arXiv — NLP / Computation & Language research 1mo ago

GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling

arXiv:2605.28835v1 Announce Type: new Abstract: Large Language Models (LLMs) extend their capabilities through function-calling (FC), which relies on training data with high quality, diversity, and broad coverage of scenario. However, obtaining and annotating real…

15

Graph-Augmented Retrieval for Cross-Entity Financial Sentiment Analysis: A Comparative Study

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

Chunking Methods on Retrieval-Augmented Generation - Effectiveness Evaluation Against Computational Cost and Limitations

ExpWeaver: LLM Agents Learn from Experience via Latent RAG

When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue

Efficient RAG with Intent-Aware Retrieval and Semantics-Preserving Chunking

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism

v0.30.0-rc32: llama-server followups (#16353)

[D] Simple Questions Thread

A Topology-Aware Spatiotemporal Handover Framework for Continuous Multi-UAV Tracking

One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

How can embedding models bind concepts?

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

ScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings

TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness

SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching

Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning

Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences

Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

CanLegalRAGBench: Evaluating Retrieval-Augmented Generation on Canadian Case Law

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets

Learning Whom to Trust: Market-Feedback Adaptive Retrieval for Frozen LLMs in Event-Driven Financial RAG

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Why Forward Deployed Engineers Are the Rage

b9442

Why does Thinking Output More Tokens Than a Response?

Why do the output layer weights become word vectors in Word2Vec? [D]

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

Kalshi, Coinbase Approved to Offer Crypto Perpetuals in U.S.

Xetrieval: Mechanistically Explaining Dense Retrieval

b9406

How the Pope’s Magnifica Humanitas offers a template for individuals to meet the AI moment

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers

RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains

TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts

K-FinHallu: A Hallucination Detection Benchmark for Multi-Turn RAG in Korean Finance

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

A comparative study of transformer-based embeddings for topic coherence

GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling