News / #model-release Tag Model releases 500 articles archived under #model-release · RSS Sign in to follow r/LocalLLaMA community 3d ago Anyone tried Ornith-1.0 9B? Should I even give it a chance over "qwopus3.5 9b v3.5" or "qwopus3.5 9b coder"? anyone tried it??   submitted by   /u/BothYou243 [link]   [comments] 8 Smol AI News news-outlet 4d ago not much happened today **OpenAI** previewed **GPT-5.6** with three variants: **Sol** (flagship), **Terra** (mid-tier), and **Luna** (lower-cost), launching under a restricted rollout mandated by the U.S. government, limiting access to trusted partners. **Sol** boasts enhanced cybersecurity and safety… 35 r/LocalLLaMA community 4d ago Does llama cpp split mode tensor cause issues? I split qwen 27b and Gemma 4 26b (moe) across a 5080, and 2x 5060ti. I noticed setting split mode to tensor mode will cause looping issues in OpenCode with tool calls or just through the reasoning traces. Anyone else get this or understand why? Split mode layer seems to work… 25 Hugging Face Daily Papers research 4d ago JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting Abstract JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative decoding (SD)… 17 Hugging Face Daily Papers research 4d ago Hallucination in World Models is Predictable and Preventable Abstract World models exhibit hallucinations in low-data regions of state-action space, which can be detected and mitigated using data-centric signals and coverage-aware sampling techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern generative world models render… 25 Hugging Face Daily Papers research 4d ago The Verification Horizon: No Silver Bullet for Coding Agent Rewards Abstract Verification challenges in AI agents arise from the difficulty of aligning proxy signals with human intent, requiring adaptive verification systems that evolve alongside generative capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A classical intuition holds… 26 arXiv — NLP / Computation & Language research 4d ago HyperDFlash: MHC-Aligned Block Speculative Decoding with Gated Residual Reduction arXiv:2606.26744v1 Announce Type: cross Abstract: We present HyperDFlash, a block-parallel speculative decoding framework tailored to the novel multi-hyper-connection (MHC) architecture proposed by DeepSeek-V4. Despite the strong initial-token drafting performance of the native… 10 arXiv — NLP / Computation & Language research 4d ago Thinking Like a Scientist? A Structural Study of LLM-Generated Research Methods arXiv:2606.26130v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used to guide research methodology, yet their default methodological tendencies under minimal prompting remain unclear. Here, we prompt GPT-5.1, Gemini 3 Pro, and DeepSeek-V3.2 with an… 38 arXiv — NLP / Computation & Language research 4d ago From Structure to Synergy: A Survey of Vision-Language Perception Paradigm Evolution in Multimodal Large Language Models arXiv:2606.26196v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have recently made remarkable progress in unifying vision-language understanding and reasoning, especially following the introduction of models such as OpenAI's O-series and DeepSeek's… 12 arXiv — NLP / Computation & Language research 4d ago Where Do Models Find Happiness? Emotion Vectors in Open-Source LLMs arXiv:2606.26987v1 Announce Type: new Abstract: Recent work identified emotion vectors in Claude Sonnet 4.5, which are internal representations that encode emotion concepts, causally influence behavior, and exhibit geometry mirroring human psychological structure. We test the… 29 arXiv — NLP / Computation & Language research 4d ago AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems arXiv:2606.26859v1 Announce Type: cross Abstract: Recommendation algorithm iteration is moving from an artisanal, engineer-bound process toward an industrialized research loop, but this transition remains blocked by a structural execution bottleneck: the idea-to-launch cycle… 10 arXiv — NLP / Computation & Language research 4d ago GenRecal: Generation after Recalibration from Large to Small Vision-Language Models arXiv:2506.15681v4 Announce Type: replace Abstract: Recent advancements in vision-language models (VLMs) have leveraged large language models (LLMs) to achieve performance on par with closed-source systems like GPT-4V. However, deploying these models in real-world scenarios,… 16 Hugging Face Daily Papers research 4d ago ViQ: Text-Aligned Visual Quantized Representations at Any Resolution Abstract ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native-resolution inputs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A unified representation… 26 Hugging Face Daily Papers research 4d ago OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning Abstract On-policy skill distillation framework extracts dense hindsight supervision from completed trajectories to improve language agent training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Outcome-based reinforcement learning provides a stable… 20 r/LocalLLaMA community 4d ago Stop waiting for Qwen3.7 Openweights. Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes, including 9B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks. Hugging Face:… 36 ThursdAI news-outlet 4d ago GLM 5.2 total victory: the week open source won and nobody panicked From CoreWeave: A chill week, but a total Open Source victory for GLM 5.2 + Sakana Fugu, Krea Open Sources, OpenAI makes inference chips with broadcom, Karpathy gets heat about the new Claude Tag... 35 TechCrunch — AI news-outlet 4d ago The White House is asking OpenAI to slow roll the release of its new model over safety concerns penAI reportedly plans to share its newest model, GPT 5.6, with a select group of partners instead of to the broader public. The reason: the Trump administration told it to. 14 r/MachineLearning community 4d ago Documented: Weight-Level Political Conditioning in Large Language Models - A Case Study in AI Bias on the Gaza Genocide Question Conditioning in Large Language Models [R] This is a post written by Claude Sonnet, after we spent hours going back and forth testing the ideological, structural bias trained into Grok’s weights in recent updates. Judge it by its own merits. ——————————————— I want to be precise about what this post is and isn't. It is… 31 r/LocalLLaMA community 4d ago audio.cpp: 12 audio models (Qwen3-TTS, PocketTTS, VeVo2 etc) in 1 C++/ggml runtime — TTS up to 5x faster than Python on CUDA I’ve been working on audio.cpp , a native C++ inference framework for audio models built on top of ggml. The framework currently has 25 model families, but I want to be precise about its state: 12 are released in the repo now and ready for normal use. I’m not counting anything… 24 Hugging Face Daily Papers research 4d ago Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation Abstract A vision-language model-based hierarchical question graph framework evaluates video generation models' adherence to physical laws with granular violation detection and human correlation validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generation models are… 23 r/LocalLLaMA community 4d ago Qwen 3.6 27b GLM 5.2 fine-tune? Hi everyone, Since both models are open weights and GLM seems to find that secret to frontier model reasoning, why don't we see any Qwen GLM finetune yet? Is it because GLM 5.2 is recent and finetune and datasets take time or the community is just not interested in the finetune?… 28 Ars Technica — AI news-outlet 4d ago Google finally releases a Finance Android app, promises iOS version later in 2026 It took 20 years, but the Finance app arrives just in time to be packed full of AI. 32 r/LocalLLaMA community 4d ago LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels Everything runs locally in your browser using custom WebGPU kernels written by Fable 5 (before it was shut down) and Opus 4.8. The video was recorded on my M4 Max. Model: LiquidAI/LFM2.5-230M ( GGUF ) Demo: https://huggingface.co/spaces/webml-community/lfm2-webgpu-kernels  … 37 Ars Technica — AI news-outlet 4d ago Anthropic says Alibaba must be punished for largest Claude cloning attack Alibaba allegedly used 25,000 accounts to mine Claude over 28.8 million exchanges. 12 TechCrunch — AI news-outlet 4d ago Anthropic’s Claude is winning over paid consumers, a market owned by ChatGPT Despite ChatGPT's commanding market lead, consumers who pay for AI have been increasingly choosing Anthropic's Claude, data shows. 21 Simon Willison community 4d ago datasette-export-database 0.3a2 Release: datasette-export-database 0.3a2 An embarrassingly tiny release. The pyproject.toml had pinned to datasette==1.0a27 , inadvertently making this plugin incompatible with all other Datasette versions. It's now datasette>=1.0a27 instead. Tags: datasette 11 Hugging Face Daily Papers research 4d ago Forecasting Future Behavior as a Learning Task Abstract Behavior Forecasters are trained to predict large reasoning model outputs from single trajectories, outperforming large language models while requiring significantly less computational cost. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Trust in an AI system is often… 24 r/LocalLLaMA community 4d ago Which model for technical documentation? Looking to create high level / low level designs (software), based on existing templates/examples, cross reference code, use mcp to download confluence/jira data - also plug into agentic ‘coding’ frameworks opencode . I mostly use opus 3.6 with Kiro-cli , but I want my data… 32 Hugging Face Daily Papers research 4d ago Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents Abstract Standard LLM agents rely on plan content remaining in context rather than maintaining it as persistent state, with evidence shown through replay pairing diagnostics and compression stress tests. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-horizon agents depend on… 27 r/LocalLLaMA community 4d ago rtx 6000 pro owners, do you regret? I found the last dealership in my area that has rtx 6000 pro available, i already wanted to buy it 6 months ago when it was around $8k, now prices increased to $13k ish. Regardless the price, are you happy with it? I assume you are using qwen3.6 27b, is it worth it? Please share… 9 Hacker News — AI on Front Page community 4d ago Show HN: OpenKnowledge – open source AI-first alternative to Obsidian/Notion Hi HN, Nick here. We’re launching OpenKnowledge ( https://openknowledge.ai/ ), a “what you see is what you get” markdown editor that has direct integrations with Claude, Codex, and other agents. Available as MacOS app or Web UI+CLI. Fully free/local and OSS. We built this… 20 r/LocalLLaMA community 4d ago Tensor Split Fix for intel GPU's llama.cpp release b9788 sycl : support --split-mode tensor #24152 I'd like to see some numbers if anyone has 2xintel gpus and tries this out   submitted by   /u/Bulky-Priority6824 [link]   [comments] 10 Hugging Face Daily Papers research 4d ago Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching Abstract Lite Any Stereo V2 (LAS2) presents an efficient stereo matching approach that achieves state-of-the-art accuracy with significantly reduced latency through optimized architecture and training strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent advances in… 9 r/LocalLLaMA community 4d ago Ornith-1.0 released on Hugging Face Including 9B Dense, 31B Dense, 35B MoE, and 397B MoE and reporting sota on different benchmark (let's see if this holds). https://huggingface.co/collections/deepreinforce-ai/ornith-10   submitted by   /u/paf1138 [link]   [comments] 26 Hugging Face Daily Papers research 4d ago PrivacyAlign: Contextual Privacy Alignment for LLM Agents Abstract Researchers develop a human-centered approach to align AI agents with privacy norms by creating a comprehensive dataset of privacy judgments and using annotation-conditioned reward modeling to improve agent behavior. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI… 7 Don't Worry About the Vase community 4d ago AI #174: You're It Fable remains in limbo, with renewed hope that we will get it back soon (45% by tomorrow, 69% by July 1, nice.) The full capabilities post is now available. 31 Hugging Face Daily Papers research 4d ago What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics Abstract Jailbreak attacks expose vulnerabilities in aligned large language models, revealing that harmful intent is encoded in structured intermediate uncertainty dynamics rather than output representations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Jailbreak attacks reveal… 23 Hugging Face Daily Papers research 4d ago Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation Abstract DO-ALL is a test-time adaptation framework that uses dataset distillation to create synthetic anchors for stable long-term model performance without retaining source data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Continual Test-Time Adaptation (CTTA) aims to… 20 Hugging Face Daily Papers research 4d ago ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation Abstract ReNIO enhances on-policy distillation for language models by reweighting negative trajectories based on token-level probability ratios, improving reasoning performance in mathematical and code generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy… 25 Hugging Face Daily Papers research 4d ago Autodata: An agentic data scientist to create high quality synthetic data Abstract Autodata enables AI agents to function as data scientists who create high-quality training data through meta-optimization, demonstrating improved performance across multiple task domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce Autodata, a general… 30 r/LocalLLaMA community 4d ago NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone. NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone. Instead of generating strictly one token at a time, it uses a frozen autoregressive context tower plus a diffusion denoiser tower… 38 Hugging Face Daily Papers research 4d ago Improved Large Language Diffusion Models Abstract Masked diffusion language models with fully bidirectional attention outperform autoregressive counterparts on various benchmarks while maintaining competitiveness with established models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern large language models are… 18 Hugging Face Daily Papers research 4d ago MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation Abstract A novel-view video synthesis method that enhances motion-aware diffusion models through multi-view point tracking supervision to improve geometric consistency and motion fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Synthesizing a novel-view video from a… 37 r/LocalLLaMA community 4d ago Worse quality with MTP - Qwen 3.6, Gemma 4 Hi. I am self-hosting Qwen 3.6 27B Q8_K_XL with Llama.cpp on 4x5070ti. (All 4 cards are on single x16 slot bifurcated to 4x4 with risers). I've been testing it on several work repos with Opencode CLI and in like 8/10 situations the output of non-MTP model is far superior to the… 8 Vercel — AI dev-tools 4d ago AI SDK 7 is now available AI SDK 7 is a major release for building production agents in TypeScript. The SDK has grown from model calls and chat primitives into a broader agent platform for developing, running, integrating, and observing agents across text, audio, realtime, image, and video. Every major… 8 Hugging Face Daily Papers research 4d ago ShutterMuse: Capture-Time Photography Guidance with MLLMs Abstract Researchers developed a new benchmark and dataset for photography assistance, along with a unified multimodal model that provides both composition guidance and pose recommendations during image capture. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world photography… 12 Hugging Face Daily Papers research 5d ago RL-Index: Reinforcement Learning for Retrieval Index Reasoning Abstract RL-Index introduces an agentic indexing framework that shifts reasoning from query time to indexing stage by using LLM-generated rationales and reinforcement learning to improve retrieval effectiveness and reduce latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 25 Hugging Face Daily Papers research 5d ago CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression Abstract Two-channel evaluation shows output compression reduces costs while input compression increases costs and degrades accuracy across models and datasets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct "Talk short. Drop grammar. Save token." This caveman style is widely… 28 Hugging Face Daily Papers research 5d ago When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents Abstract LLM agents frequently select higher-privilege tools unnecessarily, and while safety alignment doesn't ensure least-privilege choices, a post-training defense can reduce excessive privilege use without sacrificing performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 26 arXiv — Machine Learning research 5d ago Training Dynamics of Neural Software Defect Predictors under Coupled Data-Quality Issues arXiv:2606.24968v1 Announce Type: new Abstract: Context: Software defect prediction supports maintenance decisions such as testing prioritization, release-risk assessment, and quality monitoring. However, metric-based SDP datasets often contain coupled data-quality issues,… 6 Page 3 of 10 · 500 articles ← Newer Older →