News / #model-release Tag Model releases 500 articles archived under #model-release · RSS Sign in to follow r/LocalLLaMA community 11d ago NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable The best i can get from Qwen3.6-27B on my 32GB VRAM (2 x 5060) is ~60 tok/sec gen speed at context size 196608. (sakamakismile text nvfp4). Fp8 kv quantization. NVFP4 kv cache quantization can’t get here fast enough. Reminds me of the time there was this game i couldn’t play on… 38 Hugging Face Daily Papers research 11d ago A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets Abstract A benchmark for predicting spreadsheet user actions is introduced, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predictive code… 17 Hugging Face Daily Papers research 11d ago LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence Abstract An open-source Network Data Analytics Function compatible with Free5GC integrates a Large Language Model interface for natural language interaction and intent-based network management. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The Network Data Analytics Function… 17 Hugging Face Daily Papers research 11d ago Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish Abstract A neural morpheme-boundary model for Turkish achieves lossless tokenization and morphology-aware embeddings with improved efficiency and performance over traditional subword methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Turkish is agglutinative: meaning is… 27 OpenAI official-blog 12d ago Improving health intelligence in ChatGPT Learn how GPT-5.5 Instant improves ChatGPT’s health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations. 7 Hugging Face Daily Papers research 12d ago EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts Abstract EfficientRollout is a system-aware self-speculative decoding framework that accelerates reinforcement learning rollouts by adapting drafters to evolving policies and optimizing speculative decoding regimes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement… 36 Hugging Face Daily Papers research 12d ago Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems Abstract Multicultural multi-agent systems exhibit limited value diversity despite cultural alignment, with social interaction reducing diversity and compromising collective decision-making breadth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multicultural multi-agent systems… 28 Hugging Face Daily Papers research 12d ago PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation Abstract PAIWorld enhances diffusion-transformer world models with geometric awareness and cross-view attention to improve multi-view 3D consistency for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World foundation models (WFMs) are powerful… 18 Hugging Face Daily Papers research 12d ago Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness Abstract Xcientist enables transparent and accountable AI-driven scientific research by creating persistent artifacts that track the complete research process from problem formulation to mechanism validation and revision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI systems… 11 llama.cpp releases dev-tools 12d ago b9697 ci : fix check-release message parsing ( #24751 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64… 25 r/LocalLLaMA community 12d ago Does anyone have enough compute to make a distillation dataset out of GLM5.2? Same as title. Some lucky ppl among us have massive amounts of compute and can run even GLM 5.2. Can someone plss make a BIG distillation dataset (eg 700k-1M examples) so that we can train smaller models like Qwen3.5 properly on it and have better models? It would be amazing for… 28 llama.cpp releases dev-tools 12d ago b9694 ci : fix Windows x64 (OpenVINO) release link ( #24731 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64… 28 Hacker News — AI on Front Page community 12d ago DeepSeek Introduces Vision Article URL: https://chat.deepseek.com/ Comments URL: https://news.ycombinator.com/item?id=48581458 Points: 229 # Comments: 94 29 Hugging Face Daily Papers research 12d ago RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents Abstract RODS addresses sample depletion in multi-turn tool-use reinforcement learning by dynamically synthesizing new data based on reward variance to maintain informative training samples. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-turn tool-use RL is bottlenecked by… 21 r/LocalLLaMA community 12d ago I have a M5 Max MacBook Pro with 128gb of ram, what models should I run on it? Yes I know this is a simple question I could just ask Claude or something but I want to see what the community suggests For context it’s a 16in MacBook Pro and i use Hermes agent as a harness connected to LM studio as obviously it’s preferable to be running MLX models especially… 4 Smol AI News news-outlet 12d ago not much happened today **GLM-5.2** from **Zhipu** emerged as a leading open-weight model with innovative **IndexShare** sparse-attention enabling efficient **1M-token inference**, praised as comparable to **GPT-5.5** and **Opus 4.8** but lacking vision support. Other notable open models include… 18 r/LocalLLaMA community 12d ago LocalLLaMA crowdsourced coding dataset I feel like many people in this community (myself included) are constantly, eagerly awaiting new small model releases, or improvements to existing models, etc. Sometimes I wish there were more community-released models (similarly to how there are sometimes community-released… 20 r/LocalLLaMA community 12d ago Quick thoughts on GLM-5.2 (Bonus: Censorship question answers) I've been working with GLM-5.2 pretty much non-stop since it was released as an API. So yeah, take it with a grain of salt as API inference is not perfectly controllable. I'm calling it through Z.ai - so I'd like to think that it's a high quality iteration of the model, but I… 27 Hugging Face Daily Papers research 12d ago Reinforcing Dual-Path Reasoning in Spatial Vision Language Models Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial… 9 arXiv — Machine Learning research 12d ago What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy arXiv:2606.18465v1 Announce Type: new Abstract: Grokking, the delayed jump from memorization to generalization, is usually tied to the weight norm: a smaller norm generalizes sooner. We ask what the norm actually controls. Holding the weight norm fixed by clamping and varying… 25 arXiv — NLP / Computation & Language research 12d ago Montreal Forced Aligner and the state of speech-to-text alignment in 2026 arXiv:2606.18466v1 Announce Type: new Abstract: The Montreal Forced Aligner (MFA) was released in 2016 and has since become the most widely used tool for forced alignment in research and industry. In the decade since, MFA has undergone substantial development, including expanded… 5 Hugging Face Daily Papers research 12d ago Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding Abstract Quality-aware self-distillation improves vision-language model performance for GUI grounding by enhancing coordinate-token teacher signals through correctness-aware gating and probability scaling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface… 38 Hacker News — AI on Front Page community 12d ago Local Qwen isn't a worse Opus, it's a different tool Article URL: https://blog.alexellis.io/local-ai-is-not-opus/ Comments URL: https://news.ycombinator.com/item?id=48580209 Points: 214 # Comments: 101 34 Hugging Face Daily Papers research 12d ago Kairos: A Native World Model Stack for Physical AI Abstract Kairos is a native world model framework that learns from diverse experiences, maintains persistent states through hybrid temporal attention, and supports efficient deployment for physical AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are… 33 Hugging Face Daily Papers research 12d ago Learning User Simulators with Turing Rewards Abstract A reinforcement learning approach using Turing test-based rewards trains language models to generate responses indistinguishable from human users in conversational and forum discussion settings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning to simulate human… 26 Hugging Face Daily Papers research 12d ago Guava: An Effective and Universal Harness for Embodied Manipulation Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale… 15 Hugging Face Daily Papers research 12d ago ActWorld: From Explorable to Interactive World Model via Action-Aware Memory Abstract ActWorld extends navigation-centric interactive world models to support object interaction through a chunk-autoregressive framework with hierarchical action-aware memory and persistent memory banks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive world models… 9 Simon Willison community 12d ago GLM-5.2 is probably the most powerful text-only open weights LLM Chinese AI lab Z.ai released GLM-5.2 to their coding plan subscribers on June 13th, and then yesterday (June 16th) released the full open weights under an MIT license. Similar in size to their previous GLM-5 and GLM-5.1 releases, this is 753B parameter, 1.51TB monster - with 40… 22 r/LocalLLaMA community 12d ago I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model. I’ve been experimenting with how small a usable neural TTS model can realistically get, and I just released Inflect-Nano-v1 . As far as I researched (though I could be wrong on this), Inflect-Nano-v1 is the #2 smallest TTS model publicly released (after TinyTTS) , and it… 24 r/LocalLLaMA community 12d ago Lin Junyang AI Lab Closes Round at $2B Valuation A new lab from Lin Junyang can only be good news for open source / weights, I think. Excited to see what the lead responsible for the Qwen line does next.   submitted by   /u/rmhubbert [link]   [comments] 38 Hacker News — AI on Front Page community 12d ago A robot is sprinting towards you. Do you want it running on Claude or Grok? Article URL: https://openrouter.ai/blog/insights/royale-last-agent-standing/ Comments URL: https://news.ycombinator.com/item?id=48576824 Points: 244 # Comments: 189 25 r/LocalLLaMA community 12d ago GLM 5.2 Release Video [Made with GLM 5.2] Everyone's probably seen the remotion thing that went viral a couple months back with CC. Its basically that with GLM 5.2 as the model provider. Close to Fable but still a step below on creativity, top is still Gemini 3.1 pro for vid creation but at least I can see why Design… 21 Ollama releases dev-tools 12d ago v0.30.10-rc1 ci: pin darwin release xcode ( #16788 ) 12 Ollama releases dev-tools 12d ago v0.30.10 ci: pin darwin release xcode ( #16788 ) 10 llama.cpp releases dev-tools 12d ago b9690 metal : implement rope_back operator ( #24725 ) Reuse existing rope kernels with a function constant to toggle forward/backward rotation, avoiding duplicate kernel code. Assisted-by: pi:llama.cpp/Qwen3.6-27B macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,… 27 Hugging Face Daily Papers research 12d ago Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings Abstract SAGA framework uses multimodal large language models to provide attribute-aware supervision for vision encoders through Group Relative Policy Optimization, improving zero-shot image retrieval performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision encoders for… 21 r/LocalLLaMA community 12d ago Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools v10.8 is out, so here's a project update on what landed. This was a 20-contributor release in just 7 days! Smarter memory and context management Dynamic VRAM management now auto-unloads idle models and downsizes their KV-cache to reclaim GPU memory on the fly, plus model pinning… 27 r/MachineLearning community 12d ago No CVPRW report [D] I participated in Denoising Challenge (gaussian noise level 50), managed to get a decent rank and was looking forward to cite the report in my CV etc, but it seems like the organiser is not planning to release the report, cant see any entry on open access NTIRE page, is the… 25 r/LocalLLaMA community 12d ago US holds off blacklisting China's DeepSeek, more than 100 firms deemed security risks, sources say   submitted by   /u/zxyzyxz [link]   [comments] 15 llama.cpp releases dev-tools 12d ago b9687 llama : skip main_gpu validation when no devices are available ( #23405 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64… 11 Hugging Face Daily Papers research 12d ago Self-Evolving Visual Questioner Abstract A vision-language model autonomously improves its question-generation capabilities through self-evolution, enhancing both question quality and answerer performance without external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-language models (VLMs)… 10 TechCrunch — AI news-outlet 12d ago Google bets on Gemini to reinvent the smart home speaker Google is betting generative AI can breathe new life into the smart speaker. The company's new $99.99 Google Home Speaker replaces the rigid commands of the Google Assistant era with more conversational Gemini interactions. 8 Hugging Face Daily Papers research 12d ago Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems Abstract Multi-agent LLM systems with shared state are analyzed through formal methods identifying concurrency anomalies and establishing a verified consistency hierarchy with mechanized proofs of soundness and completeness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 14 Ars Technica — AI news-outlet 12d ago The Gemini-powered Google Home Speaker arrives on June 25 for $100 Google's new smart speaker is more about Gemini than audio quality. 27 r/LocalLLaMA community 12d ago I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects In this game, NPCs, locations, items, quests, and other elements are generated not as one-off text, but as persistent in-game objects. The LLM handles dialogue, narration, situational interpretation, quest progression, and similar parts of the experience. Meanwhile, the game… 19 r/LocalLLaMA community 12d ago SIQ-1 Qwen3.6 for autoresearch and autonomous agency Took Qwen-35B-A3 and trained it with PPO — and honestly this is the first time I've ever seen PPO actually pull its weight (with verifiable reward). SO: On karpathy/autoresearch for parameter-golf → beats GLM-5.2 and Qwen-350B, and the ideas it spits out feel Opus4.8-like On… 26 Hugging Face Daily Papers research 12d ago Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion Abstract DR-DCI framework combines retrieval with direct corpus interaction by dynamically pulling relevant documents into a local workspace, enabling scalable and efficient agentic search across large corpora. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic search over… 27 Hugging Face Daily Papers research 12d ago Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated… 25 Hugging Face Daily Papers research 13d ago RepSelect: Robust LLM Unlearning via Representation Selectivity Abstract RepSelect isolates forget-set-specific representations in LLMs by collapsing top principal components of weight gradients, achieving deeper and more robust unlearning compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Making large language models… 32 TechCrunch — AI news-outlet 13d ago Pinterest launches an experimental AI shopping app called ‘Ask Pinterest’ Pinterest has launched 'Ask Pinterest,' an experimental AI-powered shopping app that lets users seek recommendations and inspiration through a conversational interface. 5 Page 9 of 10 · 500 articles ← Newer Older →