Tag

Model releases

500 articles archived under #model-release · RSS

Don't Worry About the Vase community 10d ago

Claude Fable 5 and Mythos 5: Capabilities

Only three days after the release of Claude Fable 5, Anthropic was forced by the United States Government to make it unavailable, when a jailbreak was brought to its attention, rather than the previous situation of ‘yes obviously experts can jailbreak anything if they care…

32
r/LocalLLaMA community 10d ago

What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6?

Write a single HTML file with a full-page canvas and no libraries. Simulate a realistic Döner Style kebab skewer rotating (vertically) in front of a gas powered heating element. Mentioning Döner activates GLM 5.2s german weights or something (Spiess = Skewer, Brenner = Burner).…

34
r/LocalLLaMA community 10d ago

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti)

I wanted to find the exact floor for running an intelligent, local voice assistant agent on consumer hardware. I kept the environment, tools, and prompts identical, I stepped the model sizes down through Qwen 3.5 9B, 4B, 2B, and 0.8B to see how agentic reasoning degrades. The…

12
r/LocalLLaMA community 10d ago

The Eagle(3) has landed (for Qwen)

https://github.com/ggml-org/llama.cpp/releases/tag/b9723 Available in the latest release. Enabled via: --spec-type draft-eagle3 You'll need to feed it a draft model. There's issues with unsloth + eagle at the moment so I've personally tested against: Model:…

16
llama.cpp releases dev-tools 10d ago

b9723

spec: support eagle3 for qwen3.5 & 3.6 ( #24593 ) spec: support qwen3.5 & 3.6 eagle3 draft eagle3: Add deferred boundary checkpoints restore support for hybrid models apply suggestions Co-authored-by: Georgi Gerganov [email protected] spec: adapt to API change spec: fix naming…

21
r/LocalLLaMA community 10d ago

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts

You can read about it here: https://artificialanalysis.ai/articles/aa-briefcase This is a solid benchmark from Artificial Analysis. It basically tests an LLMs ability to plan and execute tasks. And more importantly, it is a new benchmark that is not saturated, so no one can…

32
r/LocalLLaMA community 10d ago

spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp

let's try is it better than MTP   submitted by   /u/jacek2023 [link]   [comments]

5
Hugging Face Daily Papers research 11d ago

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Abstract Uniform 4-bit training with RHT-based quantization outperforms E2M1-based methods by eliminating shrinkage bias and improving training stability across large language model architectures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct FP4 training promises substantial…

31
Hugging Face Daily Papers research 11d ago

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Abstract Multi-LCB addresses the limitation of LiveCodeBench by providing a multi-language benchmark for evaluating LLMs across twelve programming languages while maintaining contamination controls and evaluation protocols. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

33
r/LocalLLaMA community 11d ago

Researchers trained a Deep Research agent with 32 H100s and open-sourced everything

Ohio State University's NLP team released QUEST-35B, an open-source Deep Research agent trained using ~32 H100s and ~8K synthetic samples. The team open-sourced the training recipe, code, weights and datasets. Benchmark results show competitive performance against several…

13
Hugging Face Daily Papers research 11d ago

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

Abstract Game development frameworks and benchmarks were created using data from game jam competitions to evaluate code generation and project-level programming capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current AI-driven game development has made substantial…

25
Hugging Face Daily Papers research 11d ago

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

Abstract ImageWAM demonstrates that pretrained image editing models can effectively replace video generation in world action models for robot control, achieving better performance with reduced computational costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World Action Models…

25
Hugging Face Daily Papers research 11d ago

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Abstract ENPIRE framework enables autonomous robotics research through a closed-loop system that automates policy improvement via environment feedback, policy refinement, and evolutionary code optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Achieving dexterous robotic…

27
Hugging Face Daily Papers research 11d ago

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Abstract A unified controllable video world model generates videos from a single image while preserving scene structure and transferring to target weather states through specialized parameterization and conditioning techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video…

22
Hugging Face Daily Papers research 11d ago

Current World Models Lack a Persistent State Core

Abstract Current world models fail to maintain consistent world states when unobserved, indicating a need for design changes that prioritize physical state stability over appearance fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are increasingly regarded as…

18
Hugging Face Daily Papers research 11d ago

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

Abstract Hybrid linear attention models can be improved through a novel initialization technique that enhances conversion from pretrained Transformers by leveraging teacher attention statistics and alignment steps. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Hybrid linear…

6
Hugging Face Daily Papers research 11d ago

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

Abstract DragMesh-2 enables dexterous hand-object interaction through contact-driven manipulation, with PICA enhancing robustness under varying contact loads without tactile feedback. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Dexterous interaction with articulated objects is…

19
Hugging Face Daily Papers research 11d ago

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Abstract Egocentric human video can effectively replace teleoperated robot trajectories for embodied model pretraining, achieving better performance with reduced data collection costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Embodied foundation models are expected to…

22
r/LocalLLaMA community 11d ago

Little late thank you to the DeepSeek team!

7 moths ago I posted https://www.reddit.com/r/LocalLLaMA/s/Z32skdSKzY Just wanted to thank you for DeepSeek V4 Pro and extra big Thank You for the Flash version that fits on my local hardware! Thank You!!!!   submitted by   /u/Sorry_Ad191 [link]   [comments]

37
Smol AI News news-outlet 11d ago

not much happened today

**GLM-5.2** emerges as a leading open-weight coding model rivaling **Opus 4.8** and **GPT-5.5** in software engineering tasks, emphasizing the strategic importance of open models for provider competition, on-prem deployment, and fine-tuning rights. Experts like **Patrick…

17
Hugging Face Daily Papers research 11d ago

Playful Agentic Robot Learning

Abstract Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current agentic robot systems can write…

4
arXiv — NLP / Computation & Language research 11d ago

Displacement Is Not Direction: Evaluating Fidelity Metrics for Quantized LLM Deployment

arXiv:2606.19558v1 Announce Type: cross Abstract: Fidelity metrics, such as per-token KL divergence (KLD) against a high-precision reference, are often used in practice as low-cost proxies for benchmark quality. We test this practice on a 28-quant cohort of Qwen3.6-35B-A3B and a…

32
arXiv — Machine Learning research 11d ago

Convex training of Lipschitz-regularized shallow neural networks

arXiv:2606.19652v1 Announce Type: new Abstract: In this work, we introduce a training procedure for shallow neural networks that promotes robustness against adversarial attacks. We solve a non-convex Lipschitz-regularized training program by introducing a convex restriction that…

24
arXiv — NLP / Computation & Language research 11d ago

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

arXiv:2606.19348v1 Announce Type: new Abstract: We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) --…

11
arXiv — NLP / Computation & Language research 11d ago

Generative Engine Optimization at Scale: Measuring Brand Visibility Across AI Search Engines

arXiv:2606.20065v1 Announce Type: cross Abstract: People increasingly get answers straight from AI search engines like ChatGPT, Claude, Perplexity, and Gemini rather than scrolling search results. Brands that once focused on search engine optimization (SEO) must now optimize for…

7
Hugging Face Daily Papers research 11d ago

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial…

28
Hugging Face Daily Papers research 11d ago

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Abstract FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-step LLM pipelines fail through interactions among…

38
r/LocalLLaMA community 11d ago

[NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch!

Hey r/LocalLLaMA ! We just released SupraVL-Nano-900k , our first VLM. It has ~900k parameters, was trained from scratch on Flickr8k, and the entire architecture fits in a single Jupyter notebook. This is not a production model, it's a fully transparent, readable blueprint for…

27
Hugging Face Daily Papers research 11d ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Abstract Aggregate-score leaderboards in agent benchmarks fail to capture deployment-relevant dimensions and show rank instability, necessitating new evaluation frameworks based on predictive validity and out-of-distribution criteria. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

27
Hugging Face Daily Papers research 11d ago

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While…

35
Hugging Face Daily Papers research 11d ago

LooseControlVideo: Directorial Video Control using Spatial Blocking

Abstract LooseControlVideo enables intuitive 3D spatial control in text-to-video generation using sparse oriented 3D boxes as proxies, achieving superior trajectory accuracy and occlusion handling compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Precise…

10
r/LocalLLaMA community 11d ago

2 weeks since the release of Gemma 4 12b Unified, how are we feeling about it?

I'm looking for a good model to run on a 5090 and have ample context ~128k. This model looks good for me, it seems to have good performance in the 12b range, almost comparable to Gemma 4 26B A4B. Building a custom harness for it and have ~300m of tokens to fine tune on. Do you…

6
r/LocalLLaMA community 11d ago

GLM-5.2 is above GPT-5.5 in AA-Briefcase, Artificial Analysis' new agentic knowledge work eval

  submitted by   /u/analysis_scaled [link]   [comments]

7
Simon Willison community 11d ago

Datasette Apps: Host custom HTML applications inside Datasette

Today we launched a new plugin for Datasette, datasette-apps , with this launch announcement post on the Datasette project blog. That post has the what , but I'm going to expand on that a little bit here to provide the why . The TL;DR Datasette Apps are self-contained…

14
Hugging Face Daily Papers research 11d ago

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

Abstract RL4IL enables robust robotic manipulation under sensor dropout by using reinforcement learning to retrieve relevant demonstrations and cross-attention fusion to impute missing modalities without retraining. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic systems…

23
LangChain releases dev-tools 11d ago

langchain==1.3.10

Changes since langchain==1.3.9 release(langchain): 1.3.10 ( #38255 ) chore: bump cryptography from 46.0.7 to 48.0.1 in /libs/langchain_v1 ( #38176 ) chore: bump aiohttp from 3.14.0 to 3.14.1 in /libs/langchain_v1 ( #38179 ) fix(langchain): switch summary format ( #38171 )…

38
r/MachineLearning community 11d ago

Neuron Populations Exhibit Divergent Selectivity with Scale [R]

Hi! We just released a paper where we study “Rosetta Neurons”: universal neurons across different neural networks, and their relationship to scaling laws, specialization, and monosemanticity. Would love to kick off a discussion and get the community's thoughts. Main Findings: We…

11
LangChain releases dev-tools 11d ago

langchain-core==1.4.8

Changes since langchain-core==1.4.7 chore: bump jupyter-server from 2.18.0 to 2.20.0 in /libs/core ( #38252 ) chore: bump tornado from 6.5.6 to 6.5.7 in /libs/core ( #38184 ) chore: bump bleach from 6.3.0 to 6.4.0 in /libs/core ( #38198 ) release(core): 1.4.8 ( #38254 )…

36
r/LocalLLaMA community 11d ago

Local Qwen isn't a worse Opus, it's a different tool

  submitted by   /u/cafedude [link]   [comments]

37
Simon Willison community 11d ago

datasette-acl 0.6a0

Release: datasette-acl 0.6a0 This release expands datasette-acl from table-only permissions toward a general resource-sharing system. Alex Garcia did most of the work for this release - we're fleshing out the plugin that will allow multi-user Datasette instances finely grained…

36
r/LocalLLaMA community 11d ago

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter

Hey! We heard the feedback on making the model more portable and accessible. So in light of that we have 2 updates to share. First, you can pull a new 4-bit quant straight from Hugging Face , so it’s now small enough to run on a Mac or whatever local hardware you’ve got. It…

21
TechCrunch — AI news-outlet 11d ago

‘Queer Eye’s’ life coach Karamo Brown launches Kē, a wellness app featuring his AI digital clone

Karamo Brown, famous for his pep talks on Netflix’s “Queer Eye,” has jumped into the wellness and AI space with his new app, Kē. After spending a year and a half focusing on his own journey—from fitness and nutrition to meditation, sobriety, relationships, and personal…

28
r/LocalLLaMA community 11d ago

Cutting LLM Token Costs with rtk, headroom, and caveman - savings measured on real workloads

rtk , headroom , and caveman keep showing up whenever someone posts about cutting their token bill 60-90%. I wanted to know what they save on an actual bill instead of a benchmark, so I replayed all three over my own Claude Code history. My corpus was 500 of my own Claude Code…

11
Hugging Face Daily Papers research 11d ago

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

Abstract MaineCoon represents the first real-time audio-visual autoregressive model for social worlds, achieving high frame rates and long-horizon generation through novel training techniques and inference frameworks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As an increasing…

21
Hacker News — AI on Front Page community 11d ago

Ubiquiti: Enterprise NAS, Built on ZFS

Article URL: https://blog.ui.com/article/introducing-enterprise-nas Comments URL: https://news.ycombinator.com/item?id=48585866 Points: 281 # Comments: 254

34
r/LocalLLaMA community 11d ago

the power of intelligence is better in the hands of the people than in the board rooms of tycoons.

Hey [ r/localllama ]( r/localllama ). I wanted to share what's new with our open source PearlOS project since you all last saw (90 days ago). But first I want to give a massive thank you to this community, both your feedback and support were essential in getting us this far.…

22
Hugging Face Daily Papers research 11d ago

ViT-Up: Faithful Feature Upsampling for Vision Transformers

Abstract ViT-Up is a feature upsampling framework for Vision Transformers that uses layer-wise query construction from hidden states to improve dense prediction tasks, outperforming existing image-guided methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision Transformers…

27
Hugging Face Daily Papers research 11d ago

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

Abstract IOSWorld is introduced as the first interactive native iOS simulator benchmark featuring persistent user identity across multiple apps to evaluate personalized mobile agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A useful phone agent needs to be…

6
Hugging Face Daily Papers research 11d ago

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

Abstract MyPCBench evaluates computer-use agents as personal assistants in a simulated Linux desktop environment with real-world web applications, revealing that Claude Opus 4.6 achieves the highest task completion rate of 55.4% while struggles with multi-application tasks and…

29
r/LocalLLaMA community 11d ago

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable

The best i can get from Qwen3.6-27B on my 32GB VRAM (2 x 5060) is ~60 tok/sec gen speed at context size 196608. (sakamakismile text nvfp4). Fp8 kv quantization. NVFP4 kv cache quantization can’t get here fast enough. Reminds me of the time there was this game i couldn’t play on…

38

Claude Fable 5 and Mythos 5: Capabilities

What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6?

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti)

The Eagle(3) has landed (for Qwen)

b9723

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts

spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Researchers trained a Deep Research agent with 32 H100s and open-sourced everything

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Current World Models Lack a Persistent State Core

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Little late thank you to the DeepSeek team!

not much happened today

Playful Agentic Robot Learning

Displacement Is Not Direction: Evaluating Fidelity Metrics for Quantized LLM Deployment

Convex training of Lipschitz-regularized shallow neural networks

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Generative Engine Optimization at Scale: Measuring Brand Visibility Across AI Search Engines

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

[NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch!

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

LooseControlVideo: Directorial Video Control using Spatial Blocking

2 weeks since the release of Gemma 4 12b Unified, how are we feeling about it?

GLM-5.2 is above GPT-5.5 in AA-Briefcase, Artificial Analysis' new agentic knowledge work eval

Datasette Apps: Host custom HTML applications inside Datasette

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

langchain==1.3.10

Neuron Populations Exhibit Divergent Selectivity with Scale [R]

langchain-core==1.4.8

Local Qwen isn't a worse Opus, it's a different tool

datasette-acl 0.6a0

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter

&#8216;Queer Eye&#8217;s&#8217; life coach Karamo Brown launches Kē, a wellness app featuring his AI digital clone

Cutting LLM Token Costs with rtk, headroom, and caveman - savings measured on real workloads

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

Ubiquiti: Enterprise NAS, Built on ZFS

the power of intelligence is better in the hands of the people than in the board rooms of tycoons.

ViT-Up: Faithful Feature Upsampling for Vision Transformers

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable

‘Queer Eye’s’ life coach Karamo Brown launches Kē, a wellness app featuring his AI digital clone