Tag

Model releases

500 articles archived under #model-release · RSS

r/LocalLLaMA community 11d ago

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable

The best i can get from Qwen3.6-27B on my 32GB VRAM (2 x 5060) is ~60 tok/sec gen speed at context size 196608. (sakamakismile text nvfp4). Fp8 kv quantization. NVFP4 kv cache quantization can’t get here fast enough. Reminds me of the time there was this game i couldn’t play on…

38
Hugging Face Daily Papers research 11d ago

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Abstract A benchmark for predicting spreadsheet user actions is introduced, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predictive code…

17
Hugging Face Daily Papers research 11d ago

LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence

Abstract An open-source Network Data Analytics Function compatible with Free5GC integrates a Large Language Model interface for natural language interaction and intent-based network management. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The Network Data Analytics Function…

17
Hugging Face Daily Papers research 11d ago

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Abstract A neural morpheme-boundary model for Turkish achieves lossless tokenization and morphology-aware embeddings with improved efficiency and performance over traditional subword methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Turkish is agglutinative: meaning is…

27
OpenAI official-blog 12d ago

Improving health intelligence in ChatGPT

Learn how GPT-5.5 Instant improves ChatGPT’s health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations.

7
Hugging Face Daily Papers research 12d ago

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Abstract EfficientRollout is a system-aware self-speculative decoding framework that accelerates reinforcement learning rollouts by adapting drafters to evolving policies and optimizing speculative decoding regimes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement…

36
Hugging Face Daily Papers research 12d ago

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

Abstract Multicultural multi-agent systems exhibit limited value diversity despite cultural alignment, with social interaction reducing diversity and compromising collective decision-making breadth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multicultural multi-agent systems…

28
Hugging Face Daily Papers research 12d ago

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

Abstract PAIWorld enhances diffusion-transformer world models with geometric awareness and cross-view attention to improve multi-view 3D consistency for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World foundation models (WFMs) are powerful…

18
Hugging Face Daily Papers research 12d ago

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

Abstract Xcientist enables transparent and accountable AI-driven scientific research by creating persistent artifacts that track the complete research process from problem formulation to mechanism validation and revision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI systems…

11
llama.cpp releases dev-tools 12d ago

b9697

ci : fix check-release message parsing ( #24751 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

25
r/LocalLLaMA community 12d ago

Does anyone have enough compute to make a distillation dataset out of GLM5.2?

Same as title. Some lucky ppl among us have massive amounts of compute and can run even GLM 5.2. Can someone plss make a BIG distillation dataset (eg 700k-1M examples) so that we can train smaller models like Qwen3.5 properly on it and have better models? It would be amazing for…

28
llama.cpp releases dev-tools 12d ago

b9694

ci : fix Windows x64 (OpenVINO) release link ( #24731 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

28
Hacker News — AI on Front Page community 12d ago

DeepSeek Introduces Vision

Article URL: https://chat.deepseek.com/ Comments URL: https://news.ycombinator.com/item?id=48581458 Points: 229 # Comments: 94

29
Hugging Face Daily Papers research 12d ago

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Abstract RODS addresses sample depletion in multi-turn tool-use reinforcement learning by dynamically synthesizing new data based on reward variance to maintain informative training samples. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-turn tool-use RL is bottlenecked by…

21
r/LocalLLaMA community 12d ago

I have a M5 Max MacBook Pro with 128gb of ram, what models should I run on it?

Yes I know this is a simple question I could just ask Claude or something but I want to see what the community suggests For context it’s a 16in MacBook Pro and i use Hermes agent as a harness connected to LM studio as obviously it’s preferable to be running MLX models especially…

4
Smol AI News news-outlet 12d ago

not much happened today

**GLM-5.2** from **Zhipu** emerged as a leading open-weight model with innovative **IndexShare** sparse-attention enabling efficient **1M-token inference**, praised as comparable to **GPT-5.5** and **Opus 4.8** but lacking vision support. Other notable open models include…

18
r/LocalLLaMA community 12d ago

LocalLLaMA crowdsourced coding dataset

I feel like many people in this community (myself included) are constantly, eagerly awaiting new small model releases, or improvements to existing models, etc. Sometimes I wish there were more community-released models (similarly to how there are sometimes community-released…

20
r/LocalLLaMA community 12d ago

Quick thoughts on GLM-5.2 (Bonus: Censorship question answers)

I've been working with GLM-5.2 pretty much non-stop since it was released as an API. So yeah, take it with a grain of salt as API inference is not perfectly controllable. I'm calling it through Z.ai - so I'd like to think that it's a high quality iteration of the model, but I…

27
Hugging Face Daily Papers research 12d ago

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial…

9
arXiv — Machine Learning research 12d ago

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

arXiv:2606.18465v1 Announce Type: new Abstract: Grokking, the delayed jump from memorization to generalization, is usually tied to the weight norm: a smaller norm generalizes sooner. We ask what the norm actually controls. Holding the weight norm fixed by clamping and varying…

25
arXiv — NLP / Computation & Language research 12d ago

Montreal Forced Aligner and the state of speech-to-text alignment in 2026

arXiv:2606.18466v1 Announce Type: new Abstract: The Montreal Forced Aligner (MFA) was released in 2016 and has since become the most widely used tool for forced alignment in research and industry. In the decade since, MFA has undergone substantial development, including expanded…

5
Hugging Face Daily Papers research 12d ago

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

Abstract Quality-aware self-distillation improves vision-language model performance for GUI grounding by enhancing coordinate-token teacher signals through correctness-aware gating and probability scaling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface…

38
Hacker News — AI on Front Page community 12d ago

Local Qwen isn't a worse Opus, it's a different tool

Article URL: https://blog.alexellis.io/local-ai-is-not-opus/ Comments URL: https://news.ycombinator.com/item?id=48580209 Points: 214 # Comments: 101

34
Hugging Face Daily Papers research 12d ago

Kairos: A Native World Model Stack for Physical AI

Abstract Kairos is a native world model framework that learns from diverse experiences, maintains persistent states through hybrid temporal attention, and supports efficient deployment for physical AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are…

33
Hugging Face Daily Papers research 12d ago

Learning User Simulators with Turing Rewards

Abstract A reinforcement learning approach using Turing test-based rewards trains language models to generate responses indistinguishable from human users in conversational and forum discussion settings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning to simulate human…

26
Hugging Face Daily Papers research 12d ago

Guava: An Effective and Universal Harness for Embodied Manipulation

Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale…

15
Hugging Face Daily Papers research 12d ago

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

Abstract ActWorld extends navigation-centric interactive world models to support object interaction through a chunk-autoregressive framework with hierarchical action-aware memory and persistent memory banks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive world models…

9
Simon Willison community 12d ago

GLM-5.2 is probably the most powerful text-only open weights LLM

Chinese AI lab Z.ai released GLM-5.2 to their coding plan subscribers on June 13th, and then yesterday (June 16th) released the full open weights under an MIT license. Similar in size to their previous GLM-5 and GLM-5.1 releases, this is 753B parameter, 1.51TB monster - with 40…

22
r/LocalLLaMA community 12d ago

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

I’ve been experimenting with how small a usable neural TTS model can realistically get, and I just released Inflect-Nano-v1 . As far as I researched (though I could be wrong on this), Inflect-Nano-v1 is the #2 smallest TTS model publicly released (after TinyTTS) , and it…

24
r/LocalLLaMA community 12d ago

Lin Junyang AI Lab Closes Round at $2B Valuation

A new lab from Lin Junyang can only be good news for open source / weights, I think. Excited to see what the lead responsible for the Qwen line does next.   submitted by   /u/rmhubbert [link]   [comments]

38
Hacker News — AI on Front Page community 12d ago

A robot is sprinting towards you. Do you want it running on Claude or Grok?

Article URL: https://openrouter.ai/blog/insights/royale-last-agent-standing/ Comments URL: https://news.ycombinator.com/item?id=48576824 Points: 244 # Comments: 189

25
r/LocalLLaMA community 12d ago

GLM 5.2 Release Video [Made with GLM 5.2]

Everyone's probably seen the remotion thing that went viral a couple months back with CC. Its basically that with GLM 5.2 as the model provider. Close to Fable but still a step below on creativity, top is still Gemini 3.1 pro for vid creation but at least I can see why Design…

21
Ollama releases dev-tools 12d ago

v0.30.10-rc1

ci: pin darwin release xcode ( #16788 )

12
Ollama releases dev-tools 12d ago

v0.30.10

ci: pin darwin release xcode ( #16788 )

10
llama.cpp releases dev-tools 12d ago

b9690

metal : implement rope_back operator ( #24725 ) Reuse existing rope kernels with a function constant to toggle forward/backward rotation, avoiding duplicate kernel code. Assisted-by: pi:llama.cpp/Qwen3.6-27B macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

27
Hugging Face Daily Papers research 12d ago

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

Abstract SAGA framework uses multimodal large language models to provide attribute-aware supervision for vision encoders through Group Relative Policy Optimization, improving zero-shot image retrieval performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision encoders for…

21
r/LocalLLaMA community 12d ago

Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools

v10.8 is out, so here's a project update on what landed. This was a 20-contributor release in just 7 days! Smarter memory and context management Dynamic VRAM management now auto-unloads idle models and downsizes their KV-cache to reclaim GPU memory on the fly, plus model pinning…

27
r/MachineLearning community 12d ago

No CVPRW report [D]

I participated in Denoising Challenge (gaussian noise level 50), managed to get a decent rank and was looking forward to cite the report in my CV etc, but it seems like the organiser is not planning to release the report, cant see any entry on open access NTIRE page, is the…

25
r/LocalLLaMA community 12d ago

US holds off blacklisting China's DeepSeek, more than 100 firms deemed security risks, sources say

  submitted by   /u/zxyzyxz [link]   [comments]

15
llama.cpp releases dev-tools 12d ago

b9687

llama : skip main_gpu validation when no devices are available ( #23405 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

11
Hugging Face Daily Papers research 12d ago

Self-Evolving Visual Questioner

Abstract A vision-language model autonomously improves its question-generation capabilities through self-evolution, enhancing both question quality and answerer performance without external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-language models (VLMs)…

10
TechCrunch — AI news-outlet 12d ago

Google bets on Gemini to reinvent the smart home speaker

Google is betting generative AI can breathe new life into the smart speaker. The company's new $99.99 Google Home Speaker replaces the rigid commands of the Google Assistant era with more conversational Gemini interactions.

8
Hugging Face Daily Papers research 12d ago

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

Abstract Multi-agent LLM systems with shared state are analyzed through formal methods identifying concurrency anomalies and establishing a verified consistency hierarchy with mechanized proofs of soundness and completeness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

14
Ars Technica — AI news-outlet 12d ago

The Gemini-powered Google Home Speaker arrives on June 25 for $100

Google's new smart speaker is more about Gemini than audio quality.

27
r/LocalLLaMA community 12d ago

I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects

In this game, NPCs, locations, items, quests, and other elements are generated not as one-off text, but as persistent in-game objects. The LLM handles dialogue, narration, situational interpretation, quest progression, and similar parts of the experience. Meanwhile, the game…

19
r/LocalLLaMA community 12d ago

SIQ-1 Qwen3.6 for autoresearch and autonomous agency

Took Qwen-35B-A3 and trained it with PPO — and honestly this is the first time I've ever seen PPO actually pull its weight (with verifiable reward). SO: On karpathy/autoresearch for parameter-golf → beats GLM-5.2 and Qwen-350B, and the ideas it spits out feel Opus4.8-like On…

26
Hugging Face Daily Papers research 12d ago

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Abstract DR-DCI framework combines retrieval with direct corpus interaction by dynamically pulling relevant documents into a local workspace, enabling scalable and efficient agentic search across large corpora. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic search over…

27
Hugging Face Daily Papers research 12d ago

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated…

25
Hugging Face Daily Papers research 13d ago

RepSelect: Robust LLM Unlearning via Representation Selectivity

Abstract RepSelect isolates forget-set-specific representations in LLMs by collapsing top principal components of weight gradients, achieving deeper and more robust unlearning compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Making large language models…

32
TechCrunch — AI news-outlet 13d ago

Pinterest launches an experimental AI shopping app called ‘Ask Pinterest’

Pinterest has launched 'Ask Pinterest,' an experimental AI-powered shopping app that lets users seek recommendations and inspiration through a conversational interface.

5

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Improving health intelligence in ChatGPT

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

b9697

Does anyone have enough compute to make a distillation dataset out of GLM5.2?

b9694

DeepSeek Introduces Vision

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

I have a M5 Max MacBook Pro with 128gb of ram, what models should I run on it?

not much happened today

LocalLLaMA crowdsourced coding dataset

Quick thoughts on GLM-5.2 (Bonus: Censorship question answers)

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

Montreal Forced Aligner and the state of speech-to-text alignment in 2026

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

Local Qwen isn't a worse Opus, it's a different tool

Kairos: A Native World Model Stack for Physical AI

Learning User Simulators with Turing Rewards

Guava: An Effective and Universal Harness for Embodied Manipulation

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

GLM-5.2 is probably the most powerful text-only open weights LLM

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

Lin Junyang AI Lab Closes Round at $2B Valuation

A robot is sprinting towards you. Do you want it running on Claude or Grok?

GLM 5.2 Release Video [Made with GLM 5.2]

v0.30.10-rc1

v0.30.10

b9690

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools

No CVPRW report [D]

US holds off blacklisting China's DeepSeek, more than 100 firms deemed security risks, sources say

b9687

Self-Evolving Visual Questioner

Google bets on Gemini to reinvent the smart home speaker

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

The Gemini-powered Google Home Speaker arrives on June 25 for $100

I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects

SIQ-1 Qwen3.6 for autoresearch and autonomous agency

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

RepSelect: Robust LLM Unlearning via Representation Selectivity

Pinterest launches an experimental AI shopping app called &#8216;Ask Pinterest&#8217;

Pinterest launches an experimental AI shopping app called ‘Ask Pinterest’