News / #security Tag Security 500 articles archived under #security · RSS Sign in to follow r/MachineLearning community 7d ago Non-deterministic Vulnerability Detection Benchmark System [P] I work in firmware adjacent to AI, so not an ML guy exactly, so that's why I've come here. For work we got a bit concerned about Mythos and all the hype made me explore some benchmarking work. I now have this pretty cool benchmark that's about 80% done sitting around and haven't… 26 r/MachineLearning community 7d ago About ML research collab group post [D] Hi, I'm thinking of building a small community of 10-15 people where we can help each other to learn something new. The primary focus will be on ML research and open-source projects. If you're interested, DM me. knowledge of machine learning is a plus, as want to keep this a… 16 TechCrunch — AI news-outlet 7d ago SpaceX inks compute deal with Reflection AI, an open-source AI lab Reflection AI will pay $150 million a month beginning July 1, 2026 through 2029 for immediate access to Nvidia's latest GB300 AI chips and supporting hardware across SpaceX's Colossus 2 data center near Memphis, Tennessee. 33 r/MachineLearning community 7d ago Some new updates to Papers with Code [P] Hi folks, Niels here from the open-source team at Hugging Face. I continue working on a revival of paperswithcode.co as we're back to the "age of research" per Ilya Sutskever! Hence, it's important to discover each other's research and build on each other's work, so we can… 38 OpenAI official-blog 7d ago Patch the Planet: a Daybreak initiative to support open source maintainers OpenAI introduces Patch the Planet, a Daybreak initiative helping open-source maintainers find, validate, and fix vulnerabilities with AI and expert review. 23 llama.cpp releases dev-tools 8d ago b9752 server: refactor batch construction ( #24843 ) server: refactor batch construction wip wip 2 wip 3 wip 4 add abort_all_slots handle batch full more carefully fix assert rm debug log small nits (debug) add timings debug: force llama_synchronize for accurate timings address… 5 r/LocalLLaMA community 8d ago Qwen is never going to open source Qwen 3.7, aren't they? Well, this was predictable. After Qwen fired Junyang Lin, the next models are no longer open source. Labs that have released open source models more recently than Qwen: GLM-5.2, 2026-06-17 Kimi-K2.7-Code, 2026-06-12 MiniMax-M3, 2026-06-11 Step-3.7-Flash, 2026-05-29… 15 r/LocalLLaMA community 9d ago It’s time to decentralize model distribution! Introducing Noema Atlas TL;DR: Noema Atlas is a peer-to-peer network software using Iroh for local LLM weights, free and open source (Apache-2.0). Models come from whichever peers have them, with Hugging Face and mirrors as fallback (opt-in). Every file is identified by its content hash and a signed… 38 r/LocalLLaMA community 9d ago I wrote a free 15-part series on LLM internals — real math, real tensor shapes, real hardware constraints. All grounded in Gemma 4 12B's actual config. If you run open-source models and want to understand what's actually happening under the hood — I spent the last few months writing a 15-part series that covers the full stack from tokenization to production serving. Most articles are grounded in Gemma 4 12B as the running… 19 r/LocalLLaMA community 9d ago Board where every tile is an agent I've been hacking a project which I find extremely useful and wanted to share. Imagine a board where every tile is an agent those job is to maintain the tile. I tried to illustrate the idea with a video here. The project is open source on GitHub and you can also try it out here… 36 r/MachineLearning community 9d ago Studying FLUX in diffusers library was hard, so I built a smaller open-source version [P] If you've tried to study modern diffusion models by digging through the official diffusers library, you know it can be overwhelming with its complexity and abstractions. I wanted to simplify FLUX diffusion models, so I built minFLUX : a PyTorch implementation focused on its core… 38 r/LocalLLaMA community 9d ago z.AI as the number 2 gives praise to the number 1 open source model   submitted by   /u/Charuru [link]   [comments] 27 Stratechery (Ben Thompson) community 10d ago 2026.25: The Stuff of Myth(os) The best Stratechery content from the week of June 15, 2026, including Anthropic, e-commerce in the age of AI, and the NBA Finals being a perfect 10. 32 r/LocalLLaMA community 10d ago Commission selects EUROPA consortium as the winner of the Frontier AI Grande Challenge, a project to build European open-source frontier AI model in all 24 EU languages The European Commission has selected EUROPA, a European consortium led by the Italian company Domyn, as the winner of its Frontier AI Grand Challenge. Commission selects EUROPA consortium as the winner of the Frontier AI Grande Challenge, a project to build European open-source… 11 Interconnects (Nathan Lambert) research 10d ago Banning Open Source AI Would Be A Mistake This post was originally an op-ed co-authored with Kevin Xu of Interconnected for a general, non-technical audience. 20 Hugging Face Daily Papers research 10d ago No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced… 27 r/LocalLLaMA community 10d ago Researchers trained a Deep Research agent with 32 H100s and open-sourced everything Ohio State University's NLP team released QUEST-35B, an open-source Deep Research agent trained using ~32 H100s and ~8K synthetic samples. The team open-sourced the training recipe, code, weights and datasets. Benchmark results show competitive performance against several… 13 arXiv — Machine Learning research 11d ago Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures arXiv:2606.19365v1 Announce Type: new Abstract: Diffusion models have become essential for high-fidelity 3D MRI synthesis, yet their deployment remains constrained by substantial GPU resource demands arising from hundreds of U-Net evaluations per sample and a highly… 35 arXiv — Machine Learning research 11d ago FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning arXiv:2606.19408v1 Announce Type: new Abstract: Latent actions provide a compact interface between action-free video and downstream decision-making, yet existing Latent Action Models (LAMs) force every transition through a fixed-capacity bottleneck. We identify a bottleneck… 6 arXiv — Machine Learning research 11d ago Enhancing Graph Neural Networks Using Proximity Graphs for Dust Source Emission Forecasting arXiv:2606.19825v1 Announce Type: new Abstract: Accurate prediction of dust source emissions is critical for mitigating the significant environmental and health hazards posed by dust storms. Traditional forecasting methods often struggle to capture the complex spatiotemporal… 14 arXiv — NLP / Computation & Language research 11d ago Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts arXiv:2606.19345v1 Announce Type: new Abstract: The rapid increase in scientific publications leads to the fact that manual study screening in systematic literature reviews (SLRs) is increasingly resource consuming, inefficient, and inconsistent. Classifying studies that clearly… 25 arXiv — NLP / Computation & Language research 11d ago HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization arXiv:2606.20097v1 Announce Type: new Abstract: The quadratic complexity of attention poses a critical bottleneck for long-context processing, spurring interest in hybrid attention designs. Most open-source hybrid models adopt a layer-wise strategy. Yet, prior work has noted the… 13 arXiv — NLP / Computation & Language research 11d ago A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots arXiv:2606.19660v1 Announce Type: cross Abstract: Prompt injection is ranked as the most critical vulnerability in large language model (LLM) deployments by the OWASP Top 10 for LLM Applications, yet existing defenses operate at isolated pipeline stages and remain incomplete.… 25 arXiv — NLP / Computation & Language research 11d ago Benchmarking Agentic Review Systems arXiv:2606.19749v1 Announce Type: cross Abstract: A new class of agentic review systems are emerging as a remedy to the pressure placed on peer review systems by AI-assisted research, but it is unclear how they should be evaluated. We evaluate two open-source systems… 15 arXiv — NLP / Computation & Language research 11d ago ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents arXiv:2508.04266v4 Announce Type: replace Abstract: Existing benchmarks in e-commerce primarily focus on basic user intents, such as finding or purchasing products. However, real-world users often pursue more complex goals, such as applying vouchers, managing budgets, and… 22 llama.cpp releases dev-tools 11d ago b9712 cmake : fix ui build with read-only source ( #24752 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64… 4 ThursdAI news-outlet 11d ago Fable Got Banned, Open Source Delivered: GLM-5.2, Kimi K2.7 & SpaceX Buys Cursor - June 18 From CoreWeave (W&B): Fable is gone (for now). Here's everything else that happened this week: GLM-5.2 takes the open source crown, SpaceX buys Cursor for $60B, and 3 guests on the show today! 23 r/LocalLLaMA community 11d ago the power of intelligence is better in the hands of the people than in the board rooms of tycoons. Hey [ r/localllama ]( r/localllama ). I wanted to share what's new with our open source PearlOS project since you all last saw (90 days ago). But first I want to give a massive thank you to this community, both your feedback and support were essential in getting us this far.… 22 Hugging Face Daily Papers research 11d ago LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence Abstract An open-source Network Data Analytics Function compatible with Free5GC integrates a Large Language Model interface for natural language interaction and intent-based network management. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The Network Data Analytics Function… 17 Stratechery (Ben Thompson) community 11d ago An Interview with Michael Morton About E-Commerce in the Age of AI An interview with Michael Morton about e-commerce and AI, including the challenges of unfalsifiable bear cases, distribution versus referal models, grocery, and autonomous vehicles. 36 r/LocalLLaMA community 11d ago GLM-5.2 Flash when? (joke) I'm very happy with Z.ai's decision to open source GLM 5.2... With that being said, a successor to GLM-4.7-flash would be AMAZING. Literally anything in the 27-120B range (MoE or dense) 🤤   submitted by   /u/ILoveToyota37 [link]   [comments] 29 arXiv — NLP / Computation & Language research 12d ago Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier arXiv:2606.18284v1 Announce Type: cross Abstract: The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve,… 21 arXiv — Machine Learning research 12d ago Strategic Feature Selection arXiv:2606.18867v1 Announce Type: new Abstract: When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to redesign the predictor itself… 35 arXiv — Machine Learning research 12d ago Smoothness-Based Derandomization of PAC-Bayes Bounds arXiv:2606.19105v1 Announce Type: new Abstract: We study PAC-Bayes derandomization for smooth loss functions. Our goal is to obtain generalization bounds that hold with high probability for deterministic predictors by exploiting smoothness properties of both the loss and the… 27 arXiv — NLP / Computation & Language research 12d ago Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification arXiv:2606.18372v1 Announce Type: new Abstract: Educational dialogue is a valuable but sensitive resource for research: the same transcripts that capture authentic learning often capture personally identifiable information (PII) entangled with curricular content, where "Riemann"… 24 arXiv — NLP / Computation & Language research 12d ago Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation arXiv:2606.18389v1 Announce Type: new Abstract: Large language models (LLMs) have become an effective tool for synthetic data generation, including for low-resource languages, where generated data can improve downstream task performance. Current best-performing approaches… 15 arXiv — NLP / Computation & Language research 12d ago Low-resource Language Discrimination Towards Chinese Dialects with Transfer learning and Data Augmentation arXiv:2606.18597v1 Announce Type: new Abstract: Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data… 29 arXiv — NLP / Computation & Language research 12d ago SAMA: Semantic Anchor-aligned Augmentation for Unified Low-Resource Multimodal Information Extraction arXiv:2606.18780v1 Announce Type: cross Abstract: Multimodal Information Extraction (MIE)-covering tasks such as Multimodal Named Entity Recognition (MNER), Relation Extraction (MRE), and Event Extraction (MEE)-is essential for understanding multimedia content but remains… 4 arXiv — NLP / Computation & Language research 12d ago Application of integrated gradients explainability to sociopsychological semantic markers arXiv:2503.04989v2 Announce Type: replace Abstract: Classification of textual data in terms of sentiment, or more nuanced sociopsychological markers (e.g., agency), is now a popular approach commonly applied at the sentence level. In this paper, we exploit the integrated… 20 arXiv — NLP / Computation & Language research 12d ago Rethinking Cross-lingual Gaps from a Statistical Viewpoint arXiv:2510.15551v2 Announce Type: replace Abstract: Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it… 38 r/LocalLLaMA community 12d ago Lin Junyang AI Lab Closes Round at $2B Valuation A new lab from Lin Junyang can only be good news for open source / weights, I think. Excited to see what the lead responsible for the Qwen line does next.   submitted by   /u/rmhubbert [link]   [comments] 38 llama.cpp releases dev-tools 12d ago b9677 common: update logging to enforce max_capacity and optimize queue resizing ( #24490 ) common: update logging to enforce max_capacity and optimize queue resizing logic common/log: remove queue expansion logic macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,… 35 arXiv — Machine Learning research 13d ago Sum-of-Squares Degree Barriers for the Reweighted-Hinge Method in Robust Halfspace Learning: A Christoffel-Function Characterization arXiv:2606.17215v1 Announce Type: new Abstract: A certificate that removes outliers sees the data only through its low-degree moments, and an adversary exploits exactly this, hiding corruption where the clean data already looks typical, in the blind spot no bounded-degree test… 14 arXiv — Machine Learning research 13d ago Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost Model arXiv:2606.17931v1 Announce Type: new Abstract: In recent years, electronic (E) commerce services have rapidly increased in the daily lives of people, which helpsthem to purchase products online. However, retail platforms have struggled to understand customer behavior and make… 35 arXiv — NLP / Computation & Language research 13d ago Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation arXiv:2606.17820v1 Announce Type: new Abstract: This study explores how bilingual fine-tuning affects automatic speech recognition (ASR) in low-resource languages. We evaluate this method across nine linguistically and geographically diverse language pairs, covering a range of… 28 arXiv — NLP / Computation & Language research 13d ago When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning arXiv:2606.18033v1 Announce Type: new Abstract: Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward… 13 arXiv — NLP / Computation & Language research 13d ago Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0 arXiv:2606.18205v1 Announce Type: new Abstract: This paper presents a robust methodology for the systematic digitization and encoding of the Al-Mawrid Arabic-English dictionary, transforming it from a legacy print resource into a standardized computational lexicon. Addressing a… 20 arXiv — NLP / Computation & Language research 13d ago PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents arXiv:2606.17467v1 Announce Type: cross Abstract: Prompt injection defenses evaluated on synthetic benchmarks do not generalize to real enterprise documents, which are longer, denser, and interleave legitimate authority language with factual content. We demonstrate this gap with… 17 arXiv — NLP / Computation & Language research 13d ago Vision-language models for chest radiography do not always need the image arXiv:2606.17710v1 Announce Type: cross Abstract: Medical vision-language models report strong chest radiograph accuracy, and this is increasingly read as evidence that they use the image. That inference is unsafe: a model exploiting finding-name priors scores like one that… 16 arXiv — NLP / Computation & Language research 13d ago A Framework for Evaluating Agentic Skills at Scale arXiv:2606.17819v1 Announce Type: cross Abstract: Agent skills -- structured, reusable knowledge artifacts that augment LLM agent capabilities -- have been rapidly adopted in industry, yet their cross-domain impact and use across commercial and open-source models remain… 10 Page 3 of 10 · 500 articles ← Newer Older →