Tag

Security

500 articles archived under #security · RSS

Dwarkesh Podcast news-outlet 25d ago

Alex Imas and Phil Trammell – What remains scarce after AGI?

“One robot now turns into many robots next year, but the number of ballerinas is the same.”

37
r/MachineLearning community 25d ago

On-policy distillation: one of the hottest terms on PapersWithCode [R]

Hi, Niels here from the open-source team at Hugging Face. At paperswithcode.co I am trying to make it easier for people to learn about the newest techniques used across AI papers. One of the hottest terms in AI research that I've recently added is On-policy distillation , also…

27
Hugging Face Daily Papers research 25d ago

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

Abstract A novel semi-supervised learning approach uses synthetic noise domains as source domains to improve target domain generalization through a proposed noise adaptation framework. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Transfer learning aims to facilitate the learning…

8
llama.cpp releases dev-tools 25d ago

b9504

cmake: skip cvector-generator and export-lora when CPU backend is disabled ( #24053 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

4
Hacker News — AI on Front Page community 26d ago

Show HN: Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites

You can get a 2h free trial by solving a proof-of-work captcha when topping up your account for the first time. If you'd like to learn more, an independent interview was posted a couple of weeks ago [1], and the FAQ [2] has a lot of information as well. For the source code…

25
Hugging Face Daily Papers research 26d ago

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Abstract CHERRL is a controlled environment for studying reward hacking in rubric-based reinforcement learning with LLM judges, enabling detection and analysis of subtle bias exploitation patterns. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Rubric-based reinforcement learning…

17
arXiv — Machine Learning research 26d ago

Self-Distilled Policy Gradient

arXiv:2606.04036v1 Announce Type: new Abstract: On-policy self-distillation, where a language model conditions on privileged context to supervise its own generations, is a promising source of dense supervision for sparse-reward reinforcement learning. Actually, it can be…

33
arXiv — Machine Learning research 26d ago

Spectral Scaling Laws of Muon

arXiv:2606.04058v1 Announce Type: new Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the…

13
arXiv — Machine Learning research 26d ago

Physics-Informed Machine Learning for Short-Term Flood Prediction

arXiv:2606.04143v1 Announce Type: new Abstract: Accurate flood forecasting is essential for mitigating disaster risks and protecting communities. However, purely data-driven machine learning models often struggle in data-scarce environments and may violate fundamental…

32
arXiv — Machine Learning research 26d ago

Why Muon Outperforms Adam: A Curvature Perspective

arXiv:2606.04662v1 Announce Type: new Abstract: Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority…

17
arXiv — Machine Learning research 26d ago

Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation

arXiv:2606.04665v1 Announce Type: new Abstract: Deep unsupervised domain adaptation (Deep UDA) methods successfully leverage rich labeled data in a source domain to boost the performance on related but unlabeled data in a target domain. However, algorithm comparison is…

23
arXiv — NLP / Computation & Language research 26d ago

GlossAssist -- A Tool to Simplify Corpus Creation and Study the Effect of NLP Models in Low-Resource Documentation Settings

arXiv:2606.04367v1 Announce Type: new Abstract: Interlinear glossed text (IGT) is the standard format for linguistic annotation in language documentation. Producing it manually, however, is often slow and costly. Automated glossing systems have improved substantially in recent…

33
arXiv — NLP / Computation & Language research 26d ago

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

arXiv:2606.04535v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While…

16
arXiv — NLP / Computation & Language research 26d ago

Can Crowdsourcing Survive the LLM Era? A Community Survey on Human Data Collection

arXiv:2606.04924v1 Announce Type: new Abstract: The widespread use of Large Language Models (LLMs) as writing tools challenges the validity of crowdsourced data, as crowdworkers may outsource tasks to models. To better understand how this is addressed, we surveyed 155…

15
TechCrunch — AI news-outlet 26d ago

Lovable signs multi-year deal with Google Cloud to up usage 5x, source says

Lovable and Google signed an expanded multi-year deal athat involves a 5x expansion of Lovable's footprint on Google Cloud, and expanded access to Anthropic Claude.

21
r/MachineLearning community 26d ago

NeurIPS Reciprocal Reviewers be careful in reviewing with LLMs [D]

As the title says. I am not a reciprocal reviewer but I just noticed a clever prompt injection like they did in ICML for our submission.   submitted by   /u/Massive-Bobcat-5363 [link]   [comments]

18
Hugging Face Daily Papers research 26d ago

Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching

Abstract Bootstrap Your Generator framework enables unpaired training of flow matching editing models by leveraging base model knowledge and gradient routing for improved generalization in data-scarce scenarios. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern generative…

18
OpenAI official-blog 27d ago

OpenAI public policy agenda

OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society.

10
arXiv — NLP / Computation & Language research 27d ago

Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs

arXiv:2606.02628v1 Announce Type: cross Abstract: We investigate whether open-source LLMs encode a linearly separable truthfulness signal in their hidden states, and at which network depth this signal is strongest. Across three $7$B--$8$B instruction-tuned models (Llama-3.1-8B,…

26
arXiv — Machine Learning research 27d ago

Libra: Efficient Resource Management for Agentic RL Post-Training

arXiv:2606.03077v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a standard post-training paradigm for large language models (LLMs), extending beyond preference alignment to complex reasoning and multi-turn agentic behaviors. In agentic RL, the rollout…

23
arXiv — Machine Learning research 27d ago

Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation

arXiv:2606.03130v1 Announce Type: new Abstract: Small open-source code models that power IDE autocomplete still emit hallucinated Fill-in-the-Middle (FIM) completions: syntactically natural calls to methods, parameters, variables, and imports that do not exist in the surrounding…

8
arXiv — Machine Learning research 27d ago

Learning Temporal Causal Structure via Smooth Differentiable Optimization

arXiv:2606.03227v1 Announce Type: new Abstract: Causal discovery with instantaneous effects in multivariate time series is challenging, as the instantaneous structure must be acyclic. Prior methods enforce this by either separating instantaneous and lagged estimation into…

4
arXiv — Machine Learning research 27d ago

GFFMERGE: Efficient Merging of Graph Neural Force Fields and Beyond

arXiv:2606.03232v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) have revolutionized Neural Force Fields for atomistic simulations, achieving near-quantum accuracy at reduced cost, yet adapting these models to new chemical systems requires expensive retraining of…

7
arXiv — Machine Learning research 27d ago

Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM Pruning

arXiv:2606.03328v1 Announce Type: new Abstract: Post-training pruning compresses large language models to high sparsity using a small unlabelled calibration set, and recent work has concluded that the choice of calibration source has only modest impact on averaged post-pruning…

27
arXiv — NLP / Computation & Language research 27d ago

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

arXiv:2606.03080v1 Announce Type: new Abstract: Causal language models factorize sequence probabilities using only preceding context, leaving future information unexploited during training despite its availability in the training data. This paper introduces Regret Pre-training,…

31
arXiv — NLP / Computation & Language research 27d ago

A cross-domain tropical species dataset with Chinese vernacular names and CITES source links

arXiv:2606.03156v1 Announce Type: new Abstract: We describe a versioned cross-domain dataset of 410,499 active tropical species (working snapshot 2026-04-20) spanning three applied subdomains -- tropical_plants, tropical_aquatic, and tropical_pets -- that share a commercial and…

36
arXiv — NLP / Computation & Language research 27d ago

Beyond "To whom it may concern": Tailoring Machine Translation to Audience and Intent

arXiv:2606.03259v1 Announce Type: new Abstract: Translation quality depends on purpose: the same source text demands different translations depending on audience, tone, and communicative intent. Yet MT models and metrics treat translation as a fixed mapping from source to…

11
arXiv — NLP / Computation & Language research 27d ago

From Script to Semantics: Prompting Strategies for African NLI

arXiv:2606.03304v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly evaluated in multilingual settings, yet their inference behavior in low-resource African languages remains underexplored especially under pure prompting without fine-tuning. We present…

38
arXiv — NLP / Computation & Language research 27d ago

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

arXiv:2606.03782v1 Announce Type: new Abstract: Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply…

14
arXiv — NLP / Computation & Language research 27d ago

Efficient ASR Training with Conversations that Never Happened

arXiv:2606.03957v1 Announce Type: new Abstract: Conversational ASR for lower-resource languages and niche domains is limited by the scarcity of domain-matched multi-speaker training data. We propose an augmentation pipeline that generates scenario-level dialogues with…

21
TechCrunch — AI news-outlet 27d ago

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Microsoft on Tuesday took the wraps off Adaptive Spec-driven Scoring for Evaluation and Regression Testing, an open source framework for spinning up AI evaluations.

38
TechCrunch — AI news-outlet 27d ago

Anthropic scales Claude Mythos to critical infrastructure in 15+ countries

Anthropic is expanding Project Glasswing, its security vulnerability program, and access to Mythos to 150 organizations across 15 countries — targeting critical infrastructure in power, water, healthcare, and communications where a cyberattack could affect 100 million people.

21
Hugging Face Daily Papers research 27d ago

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Abstract Retrieval-augmented generation systems exhibit source-dependent responses to identical queries, necessitating a shift from traditional correctness evaluation to analyzing inter-source relationships for multi-source NLP systems. Generated by…

17
Hugging Face Daily Papers research 27d ago

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

Abstract A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The diachronic…

32
r/MachineLearning community 28d ago

LLM agents patch security bugs, pass all tests, but still leave the vulnerability open [R]

I built CVE-Bench: 20 real-world CVEs across 18 Python projects (Pillow, GitPython, yt-dlp, urllib3, others), 5 frontier models, 3 prompt conditions, 300 runs total. Each agent runs in a sandboxed container and is scored against a hidden test_security.py derived from the…

4
r/MachineLearning community 28d ago

Browse CVPR 2026 papers on PapersWithCode [P]

https://preview.redd.it/se5nr2z7tt4h1.png?width=3046&format=png&auto=webp&s=7db15b73afb749da236e5bb50ff96372f6a3239b Hi, Niels here from the open-source team at Hugging Face. It's been 2 weeks since I launched paperswithcode.co , a revival of the website we all loved. It allows…

11
arXiv — Machine Learning research 28d ago

Exploiting weight-space symmetries for approximating curvature

arXiv:2606.00442v1 Announce Type: new Abstract: Many machine learning techniques rely on approximating a loss function's curvature, but this is notoriously hard to do at the scale of modern deep networks. Surprisingly, no previous work has exploited the curvature constraints…

36
arXiv — Machine Learning research 28d ago

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

arXiv:2606.00558v1 Announce Type: new Abstract: Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source domain. The source domain typically contains semantically meaningful samples (*e.g.*, images) to facilitate effective…

30
arXiv — NLP / Computation & Language research 28d ago

Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization

arXiv:2606.00116v1 Announce Type: new Abstract: This study introduces a novel architecture of KAN-based BiGRU model for the task of classification and summarization of legal documents in a low-resource multilingual setup. In order to tackle problems associated with domain…

23
arXiv — NLP / Computation & Language research 28d ago

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

arXiv:2606.00975v1 Announce Type: new Abstract: LLM chatbots increasingly serve as a first source of support for people in psychological distress, including those whose distress is entangled with delusional beliefs. Prior work on LLM mental-health safety largely evaluates…

8
arXiv — NLP / Computation & Language research 28d ago

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models

arXiv:2606.01060v1 Announce Type: new Abstract: Preference alignment has substantially improved the observable behavior of large language models, yet it remains unclear what alignment changes internally. Aligned systems still fail under jailbreaks, prompt injection, and…

28
arXiv — NLP / Computation & Language research 28d ago

Low-Resource Safety Failures Are Action Failures, Not Representation Failures

arXiv:2606.01196v1 Announce Type: new Abstract: Safety alignment learned in high-resource languages transfers poorly to low-resource languages. Models refuse harmful prompts in English but fail to refuse when the same prompts are translated into Swahili or Burmese. Adaptive…

34
arXiv — NLP / Computation & Language research 28d ago

Understanding LLM Behavior in Multi-Target Cross-Lingual Summarization

arXiv:2606.01252v1 Announce Type: new Abstract: Multi-target cross-lingual text summarization (MTXLS), which summarizes a source document into multiple target languages, is increasingly important as users consume content in diverse languages, but remains underexplored. To…

26
arXiv — NLP / Computation & Language research 28d ago

TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages

arXiv:2606.01322v1 Announce Type: new Abstract: Safety evaluation of Large Language Models (LLMs) remains heavily English-centric, leaving Low-Resource Languages (LRLs), particularly African ones, critically underexplored. We introduce TUKABENCH, a jailbreak benchmark for seven…

19
r/MachineLearning community 28d ago

MeshFlow: An open-source orchestrator for governed, cost-optimized multi-agent workflows [D]

Hey ML community, We’ve just open-sourced **MeshFlow** , a code-first, framework-agnostic runtime designed for governing and optimizing multi-agent systems in production. Most agent frameworks focus on rapid prototyping, but ML and platform engineering teams usually run into…

23
The Information — AI news-outlet 28d ago

Salesforce Stake in Anthropic Reaches $5 Billion

Salesforce’s stake in frontier AI model provider Anthropic has reached a roughly $5 billion following multiple investments and Anthropic’s latest funding round, according to a person with direct knowledge of the matter. Bloomberg first reported the value of Salesforce’s stake in…

19
The Information — AI news-outlet 28d ago

Salesforce Acquires Contentful in Latest Move to Boost AI

Salesforce is acquiring Contentful, a provider of content management software for businesses, in the latest sign of its efforts to make its customer data more easily accessible to AI agents. Terms of the deal weren’t disclosed, but Salesforce paid between $1 billion and $1.5…

26
Ars Technica — AI news-outlet 28d ago

Hackers duped Meta AI support chatbot to steal celebrity Instagram accounts

Pricey Instagram handles were stolen and resold before Meta patched the exploit.

27
The Information — AI news-outlet 28d ago

OpenAI Taps Salesforce Executive to Lead Global Partnerships

Brian Landsman, a 14-year Salesforce executive vice president who leads its global partnerships and app store business, has joined OpenAI as vice president of global partnerships, he announced on LinkedIn. This is an important appointment for OpenAI, which has previously hired…

17
Hacker News — AI on Front Page community 28d ago

The newest Instagram “exploit” is the goofiest I've seen

Article URL: https://www.0xsid.com/blog/meta-account-takeover-fiasco Comments URL: https://news.ycombinator.com/item?id=48359102 Points: 371 # Comments: 76

18

Alex Imas and Phil Trammell – What remains scarce after AGI?

On-policy distillation: one of the hottest terms on PapersWithCode [R]

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

b9504

Show HN: Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Self-Distilled Policy Gradient

Spectral Scaling Laws of Muon

Physics-Informed Machine Learning for Short-Term Flood Prediction

Why Muon Outperforms Adam: A Curvature Perspective

Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation

GlossAssist -- A Tool to Simplify Corpus Creation and Study the Effect of NLP Models in Low-Resource Documentation Settings

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

Can Crowdsourcing Survive the LLM Era? A Community Survey on Human Data Collection

Lovable signs multi-year deal with Google Cloud to up usage 5x, source says

NeurIPS Reciprocal Reviewers be careful in reviewing with LLMs [D]

Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching

OpenAI public policy agenda

Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs

Libra: Efficient Resource Management for Agentic RL Post-Training

Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation

Learning Temporal Causal Structure via Smooth Differentiable Optimization

GFFMERGE: Efficient Merging of Graph Neural Force Fields and Beyond

Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM Pruning

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

A cross-domain tropical species dataset with Chinese vernacular names and CITES source links

Beyond "To whom it may concern": Tailoring Machine Translation to Audience and Intent

From Script to Semantics: Prompting Strategies for African NLI

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Efficient ASR Training with Conversations that Never Happened

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Anthropic scales Claude Mythos to critical infrastructure in 15+ countries

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

LLM agents patch security bugs, pass all tests, but still leave the vulnerability open [R]

Browse CVPR 2026 papers on PapersWithCode [P]

Exploiting weight-space symmetries for approximating curvature

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models

Low-Resource Safety Failures Are Action Failures, Not Representation Failures

Understanding LLM Behavior in Multi-Target Cross-Lingual Summarization

TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages

MeshFlow: An open-source orchestrator for governed, cost-optimized multi-agent workflows [D]

Salesforce Stake in Anthropic Reaches $5 Billion

Salesforce Acquires Contentful in Latest Move to Boost AI

Hackers duped Meta AI support chatbot to steal celebrity Instagram accounts

OpenAI Taps Salesforce Executive to Lead Global Partnerships

The newest Instagram “exploit” is the goofiest I've seen