News / #model-release Tag Model releases 500 articles archived under #model-release · RSS Sign in to follow Don't Worry About the Vase community 10d ago Claude Fable 5 and Mythos 5: Capabilities Only three days after the release of Claude Fable 5, Anthropic was forced by the United States Government to make it unavailable, when a jailbreak was brought to its attention, rather than the previous situation of ‘yes obviously experts can jailbreak anything if they care… 32 r/LocalLLaMA community 10d ago What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6? Write a single HTML file with a full-page canvas and no libraries. Simulate a realistic Döner Style kebab skewer rotating (vertically) in front of a gas powered heating element. Mentioning Döner activates GLM 5.2s german weights or something (Spiess = Skewer, Brenner = Burner).… 34 r/LocalLLaMA community 10d ago Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) I wanted to find the exact floor for running an intelligent, local voice assistant agent on consumer hardware. I kept the environment, tools, and prompts identical, I stepped the model sizes down through Qwen 3.5 9B, 4B, 2B, and 0.8B to see how agentic reasoning degrades. The… 12 r/LocalLLaMA community 10d ago The Eagle(3) has landed (for Qwen) https://github.com/ggml-org/llama.cpp/releases/tag/b9723 Available in the latest release. Enabled via: --spec-type draft-eagle3 You'll need to feed it a draft model. There's issues with unsloth + eagle at the moment so I've personally tested against: Model:… 16 llama.cpp releases dev-tools 10d ago b9723 spec: support eagle3 for qwen3.5 & 3.6 ( #24593 ) spec: support qwen3.5 & 3.6 eagle3 draft eagle3: Add deferred boundary checkpoints restore support for hybrid models apply suggestions Co-authored-by: Georgi Gerganov [email protected] spec: adapt to API change spec: fix naming… 21 r/LocalLLaMA community 10d ago New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts You can read about it here: https://artificialanalysis.ai/articles/aa-briefcase This is a solid benchmark from Artificial Analysis. It basically tests an LLMs ability to plan and execute tasks. And more importantly, it is a new benchmark that is not saturated, so no one can… 32 r/LocalLLaMA community 10d ago spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp let's try is it better than MTP   submitted by   /u/jacek2023 [link]   [comments] 5 Hugging Face Daily Papers research 11d ago Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe Abstract Uniform 4-bit training with RHT-based quantization outperforms E2M1-based methods by eliminating shrinkage bias and improving training stability across large language model architectures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct FP4 training promises substantial… 31 Hugging Face Daily Papers research 11d ago Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages Abstract Multi-LCB addresses the limitation of LiveCodeBench by providing a multi-language benchmark for evaluating LLMs across twelve programming languages while maintaining contamination controls and evaluation protocols. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 33 r/LocalLLaMA community 11d ago Researchers trained a Deep Research agent with 32 H100s and open-sourced everything Ohio State University's NLP team released QUEST-35B, an open-source Deep Research agent trained using ~32 H100s and ~8K synthetic samples. The team open-sourced the training recipe, code, weights and datasets. Benchmark results show competitive performance against several… 13 Hugging Face Daily Papers research 11d ago JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines Abstract Game development frameworks and benchmarks were created using data from game jam competitions to evaluate code generation and project-level programming capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current AI-driven game development has made substantial… 25 Hugging Face Daily Papers research 11d ago ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing? Abstract ImageWAM demonstrates that pretrained image editing models can effectively replace video generation in world action models for robot control, achieving better performance with reduced computational costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World Action Models… 25 Hugging Face Daily Papers research 11d ago ENPIRE: Agentic Robot Policy Self-Improvement in the Real World Abstract ENPIRE framework enables autonomous robotics research through a closed-loop system that automates policy improvement via environment feedback, policy refinement, and evolutionary code optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Achieving dexterous robotic… 27 Hugging Face Daily Papers research 11d ago Holo-World: Unified Camera, Object and Weather Control for Video World Model Abstract A unified controllable video world model generates videos from a single image while preserving scene structure and transferring to target weather states through specialized parameterization and conditioning techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video… 22 Hugging Face Daily Papers research 11d ago Current World Models Lack a Persistent State Core Abstract Current world models fail to maintain consistent world states when unobserved, indicating a need for design changes that prioritize physical state stability over appearance fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are increasingly regarded as… 18 Hugging Face Daily Papers research 11d ago Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation Abstract Hybrid linear attention models can be improved through a novel initialization technique that enhances conversion from pretrained Transformers by leveraging teacher attention statistics and alignment steps. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Hybrid linear… 6 Hugging Face Daily Papers research 11d ago DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects Abstract DragMesh-2 enables dexterous hand-object interaction through contact-driven manipulation, with PICA enhancing robustness under varying contact loads without tactile feedback. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Dexterous interaction with articulated objects is… 19 Hugging Face Daily Papers research 11d ago HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining Abstract Egocentric human video can effectively replace teleoperated robot trajectories for embodied model pretraining, achieving better performance with reduced data collection costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Embodied foundation models are expected to… 22 r/LocalLLaMA community 11d ago Little late thank you to the DeepSeek team! 7 moths ago I posted https://www.reddit.com/r/LocalLLaMA/s/Z32skdSKzY Just wanted to thank you for DeepSeek V4 Pro and extra big Thank You for the Flash version that fits on my local hardware! Thank You!!!!   submitted by   /u/Sorry_Ad191 [link]   [comments] 37 Smol AI News news-outlet 11d ago not much happened today **GLM-5.2** emerges as a leading open-weight coding model rivaling **Opus 4.8** and **GPT-5.5** in software engineering tasks, emphasizing the strategic importance of open models for provider competition, on-prem deployment, and fine-tuning rights. Experts like **Patrick… 17 Hugging Face Daily Papers research 11d ago Playful Agentic Robot Learning Abstract Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current agentic robot systems can write… 4 arXiv — NLP / Computation & Language research 11d ago Displacement Is Not Direction: Evaluating Fidelity Metrics for Quantized LLM Deployment arXiv:2606.19558v1 Announce Type: cross Abstract: Fidelity metrics, such as per-token KL divergence (KLD) against a high-precision reference, are often used in practice as low-cost proxies for benchmark quality. We test this practice on a 28-quant cohort of Qwen3.6-35B-A3B and a… 32 arXiv — Machine Learning research 11d ago Convex training of Lipschitz-regularized shallow neural networks arXiv:2606.19652v1 Announce Type: new Abstract: In this work, we introduce a training procedure for shallow neural networks that promotes robustness against adversarial attacks. We solve a non-convex Lipschitz-regularized training program by introducing a convex restriction that… 24 arXiv — NLP / Computation & Language research 11d ago DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence arXiv:2606.19348v1 Announce Type: new Abstract: We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) --… 11 arXiv — NLP / Computation & Language research 11d ago Generative Engine Optimization at Scale: Measuring Brand Visibility Across AI Search Engines arXiv:2606.20065v1 Announce Type: cross Abstract: People increasingly get answers straight from AI search engines like ChatGPT, Claude, Perplexity, and Gemini rather than scrolling search results. Brands that once focused on search engine optimization (SEO) must now optimize for… 7 Hugging Face Daily Papers research 11d ago S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial… 28 Hugging Face Daily Papers research 11d ago FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines Abstract FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-step LLM pipelines fail through interactions among… 38 r/LocalLLaMA community 11d ago [NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch! Hey r/LocalLLaMA ! We just released SupraVL-Nano-900k , our first VLM. It has ~900k parameters, was trained from scratch on Flickr8k, and the entire architecture fits in a single Jupyter notebook. This is not a production model, it's a fully transparent, readable blueprint for… 27 Hugging Face Daily Papers research 11d ago Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents Abstract Aggregate-score leaderboards in agent benchmarks fail to capture deployment-relevant dimensions and show rank instability, necessitating new evaluation frameworks based on predictive validity and out-of-distribution criteria. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 27 Hugging Face Daily Papers research 11d ago Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While… 35 Hugging Face Daily Papers research 11d ago LooseControlVideo: Directorial Video Control using Spatial Blocking Abstract LooseControlVideo enables intuitive 3D spatial control in text-to-video generation using sparse oriented 3D boxes as proxies, achieving superior trajectory accuracy and occlusion handling compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Precise… 10 r/LocalLLaMA community 11d ago 2 weeks since the release of Gemma 4 12b Unified, how are we feeling about it? I'm looking for a good model to run on a 5090 and have ample context ~128k. This model looks good for me, it seems to have good performance in the 12b range, almost comparable to Gemma 4 26B A4B. Building a custom harness for it and have ~300m of tokens to fine tune on. Do you… 6 r/LocalLLaMA community 11d ago GLM-5.2 is above GPT-5.5 in AA-Briefcase, Artificial Analysis' new agentic knowledge work eval   submitted by   /u/analysis_scaled [link]   [comments] 7 Simon Willison community 11d ago Datasette Apps: Host custom HTML applications inside Datasette Today we launched a new plugin for Datasette, datasette-apps , with this launch announcement post on the Datasette project blog. That post has the what , but I'm going to expand on that a little bit here to provide the why . The TL;DR Datasette Apps are self-contained… 14 Hugging Face Daily Papers research 11d ago Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities Abstract RL4IL enables robust robotic manipulation under sensor dropout by using reinforcement learning to retrieve relevant demonstrations and cross-attention fusion to impute missing modalities without retraining. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic systems… 23 LangChain releases dev-tools 11d ago langchain==1.3.10 Changes since langchain==1.3.9 release(langchain): 1.3.10 ( #38255 ) chore: bump cryptography from 46.0.7 to 48.0.1 in /libs/langchain_v1 ( #38176 ) chore: bump aiohttp from 3.14.0 to 3.14.1 in /libs/langchain_v1 ( #38179 ) fix(langchain): switch summary format ( #38171 )… 38 r/MachineLearning community 11d ago Neuron Populations Exhibit Divergent Selectivity with Scale [R] Hi! We just released a paper where we study “Rosetta Neurons”: universal neurons across different neural networks, and their relationship to scaling laws, specialization, and monosemanticity. Would love to kick off a discussion and get the community's thoughts. Main Findings: We… 11 LangChain releases dev-tools 11d ago langchain-core==1.4.8 Changes since langchain-core==1.4.7 chore: bump jupyter-server from 2.18.0 to 2.20.0 in /libs/core ( #38252 ) chore: bump tornado from 6.5.6 to 6.5.7 in /libs/core ( #38184 ) chore: bump bleach from 6.3.0 to 6.4.0 in /libs/core ( #38198 ) release(core): 1.4.8 ( #38254 )… 36 r/LocalLLaMA community 11d ago Local Qwen isn't a worse Opus, it's a different tool   submitted by   /u/cafedude [link]   [comments] 37 Simon Willison community 11d ago datasette-acl 0.6a0 Release: datasette-acl 0.6a0 This release expands datasette-acl from table-only permissions toward a general resource-sharing system. Alex Garcia did most of the work for this release - we're fleshing out the plugin that will allow multi-user Datasette instances finely grained… 36 r/LocalLLaMA community 11d ago Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter Hey! We heard the feedback on making the model more portable and accessible. So in light of that we have 2 updates to share. First, you can pull a new 4-bit quant straight from Hugging Face , so it’s now small enough to run on a Mac or whatever local hardware you’ve got. It… 21 TechCrunch — AI news-outlet 11d ago ‘Queer Eye’s’ life coach Karamo Brown launches Kē, a wellness app featuring his AI digital clone Karamo Brown, famous for his pep talks on Netflix’s “Queer Eye,” has jumped into the wellness and AI space with his new app, Kē. After spending a year and a half focusing on his own journey—from fitness and nutrition to meditation, sobriety, relationships, and personal… 28 r/LocalLLaMA community 11d ago Cutting LLM Token Costs with rtk, headroom, and caveman - savings measured on real workloads rtk , headroom , and caveman keep showing up whenever someone posts about cutting their token bill 60-90%. I wanted to know what they save on an actual bill instead of a benchmark, so I replayed all three over my own Claude Code history. My corpus was 500 of my own Claude Code… 11 Hugging Face Daily Papers research 11d ago MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model Abstract MaineCoon represents the first real-time audio-visual autoregressive model for social worlds, achieving high frame rates and long-horizon generation through novel training techniques and inference frameworks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As an increasing… 21 Hacker News — AI on Front Page community 11d ago Ubiquiti: Enterprise NAS, Built on ZFS Article URL: https://blog.ui.com/article/introducing-enterprise-nas Comments URL: https://news.ycombinator.com/item?id=48585866 Points: 281 # Comments: 254 34 r/LocalLLaMA community 11d ago the power of intelligence is better in the hands of the people than in the board rooms of tycoons. Hey [ r/localllama ]( r/localllama ). I wanted to share what's new with our open source PearlOS project since you all last saw (90 days ago). But first I want to give a massive thank you to this community, both your feedback and support were essential in getting us this far.… 22 Hugging Face Daily Papers research 11d ago ViT-Up: Faithful Feature Upsampling for Vision Transformers Abstract ViT-Up is a feature upsampling framework for Vision Transformers that uses layer-wise query construction from hidden states to improve dense prediction tasks, outperforming existing image-guided methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision Transformers… 27 Hugging Face Daily Papers research 11d ago iOSWorld: A Benchmark for Personally Intelligent Phone Agents Abstract IOSWorld is introduced as the first interactive native iOS simulator benchmark featuring persistent user identity across multiple apps to evaluate personalized mobile agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A useful phone agent needs to be… 6 Hugging Face Daily Papers research 11d ago MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Abstract MyPCBench evaluates computer-use agents as personal assistants in a simulated Linux desktop environment with real-world web applications, revealing that Claude Opus 4.6 achieves the highest task completion rate of 55.4% while struggles with multi-application tasks and… 29 r/LocalLLaMA community 11d ago NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable The best i can get from Qwen3.6-27B on my 32GB VRAM (2 x 5060) is ~60 tok/sec gen speed at context size 196608. (sakamakismile text nvfp4). Fp8 kv quantization. NVFP4 kv cache quantization can’t get here fast enough. Reminds me of the time there was this game i couldn’t play on… 38 Page 8 of 10 · 500 articles ← Newer Older →