Tag

Model releases

500 articles archived under #model-release · RSS

arXiv — Machine Learning research 5d ago

Training Dynamics of Neural Software Defect Predictors under Coupled Data-Quality Issues

arXiv:2606.24968v1 Announce Type: new Abstract: Context: Software defect prediction supports maintenance decisions such as testing prioritization, release-risk assessment, and quality monitoring. However, metric-based SDP datasets often contain coupled data-quality issues,…

6
arXiv — NLP / Computation & Language research 5d ago

Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations

arXiv:2606.25383v1 Announce Type: new Abstract: As previous research on annotator disagreement in discourse phenomena has shown, understanding text coherence varies considerably from one individual to another. To explore this phenomenon, we created two corpora with multiple…

28
arXiv — NLP / Computation & Language research 5d ago

Real-Time Voice AI Hears but Does Not Listen

arXiv:2606.26083v1 Announce Type: new Abstract: Speech conveys information through both words and vocal delivery. We evaluate four leading production realtime voice systems-OpenAI's GPT Realtime 2, Google's Gemini 3.1 Flash Live, and Alibaba's Qwen3.5 Omni Plus and Omni Flash-on…

34
arXiv — NLP / Computation & Language research 5d ago

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

arXiv:2606.26050v1 Announce Type: cross Abstract: Midway through an ordinary pretraining run, a small language model learns the pronoun-gender rule: cued with a girl's name ("Sue cried because"), it resolves the next pronoun to she, generalizing to held-out probes (0.94 by step…

4
Hugging Face Daily Papers research 5d ago

TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

Abstract Camera-controllable video virtual try-on framework uses a 4D proxy with explicit human-environment decoupling and DiT-based video generation for omnidirectional viewing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While Video Virtual Try-on (VVT) has achieved…

4
r/LocalLLaMA community 5d ago

[NEW MODEL] SupraWeather-Nano-Preview Just released!

SupraWeather Nano is live! ⛈️ We just released SupraWeather-Nano (preview), a small FT-Transformer model purpose-built to classify weather phenomena from raw tabular meteorological features. https://huggingface.co/SupraLabs/SupraWeather-Nano-Demo https://huggingface.co/SupraLabs…

25
Hugging Face Daily Papers research 5d ago

Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do

Abstract Multimodal Chain-of-Thought reasoning shows selective effectiveness across different tasks, with limitations in maintaining visual introspection during reasoning processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Chain-of-Thought (CoT) has become a standard method…

17
Hugging Face Daily Papers research 5d ago

DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation

Abstract DomainShuttle enables open domain subject-driven text-to-video generation with high fidelity and flexibility across in-domain and cross-domain scenarios through domain-aware modeling and dual RoPE schemes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Open domain…

10
Hugging Face Daily Papers research 5d ago

RoPE-Aware Bit Allocation for KV-Cache Quantization

Abstract Block-GTQ introduces a RoPE-aware bit allocation method for key-cache quantization that improves attention accuracy and downstream performance through adaptive bit distribution and packed cache serving. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing low-bit…

22
Hacker News — AI on Front Page community 5d ago

Cloudflare launched self-managed OAuth for all

Article URL: https://blog.cloudflare.com/oauth-for-all/ Comments URL: https://news.ycombinator.com/item?id=48668033 Points: 263 # Comments: 114

32
r/LocalLLaMA community 5d ago

For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)

Hello guys, hoping you're doing fine! I was wondering, for users with 4x-8x 6000 PROs (so between 384 and 768GB VRAM), how are bigger models working for you? I have planned to either jump to 4 or 8 from my actual system, and want to see the experiences with these lately. In…

8
Hugging Face Daily Papers research 5d ago

IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

Abstract Implicit Visual Chain-of-Thought decomposes visual conditioning into structural and semantic cascades for improved structure-aware image generation with sketch supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Unified multi-modal large language models (MLLMs)…

7
Hugging Face Daily Papers research 5d ago

Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods

Abstract A large-scale synthetic dataset and specialized model architecture are introduced to address the challenges of artistic text recognition by improving data diversity and model flexibility for irregular text layouts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct WordArt…

9
Hugging Face Daily Papers research 5d ago

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

Abstract Wan-Streamer is a unified, end-to-end multimodal model that enables real-time audio-visual interaction through causal attention mechanisms and integrated processing of visual, audio, and text modalities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present…

20
r/LocalLLaMA community 5d ago

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)!

First of all, I'm stoked to announce we are almost at 20 million downloads on HF! (counted only on my own account, no duplicates/quants/finetunes/etc) and almost 5000 members on Discord! Two releases this time, as promised, the bigger Gemma 4 QATs, both Balanced, both with MTP :…

6
r/LocalLLaMA community 5d ago

Anybody used DwarfStar with DeepSeek V4 Flash on 1x DGX Spark yet? What are your thoughts?

Hey fellow localites, Has anybody used DwarfStar with DeepSeek V4 Flash on 1x DGX Spark yet? What are your thoughts? Based on what I read, with its MoE approach, and its unified memory first then bleed into SSD approach, you can apparently load DS4 Flash and it runs well, with…

31
Vercel — AI dev-tools 5d ago

Deep Agents and OpenCode are now available in the AI SDK Harness

The AI SDK Harness lets you run established coding-agent runtimes through one unified interface, so you can switch runtimes without changing your application code. Today we're adding two new adapters, Deep Agents and OpenCode, both running inside a Vercel Sandbox. Deep Agents…

27
Simon Willison community 5d ago

simonw/browser-compat-db

simonw/browser-compat-db Inspired by Mozilla's new MDN MCP service - source code here - I decided to try converting their comprehensive mdn/browser-compat-data repository full of browser compatibility data into a SQLite database. This new GitHub Repo includes a Claude Code for…

11
Hugging Face Daily Papers research 5d ago

Critique of Agent Model

Abstract True artificial agency requires internalized structures for goals, identity, decision-making, self-regulation, and learning, distinguishing autonomous systems from task-specific ones. Generated by Qwen/Qwen2.5-Coder-32B-Instruct What is an agent? What constitutes…

24
r/LocalLLaMA community 5d ago

Do cloud chatbot's system prompts make them stupider?

When I am talking with Chat GPT or Claude about abstract concepts, I am often surprised by how they seem kind of dumb... like they aren't benefitting from their extra parameters over top open models like Kimi or GLM. In fact they often seem stupider than these open models that I…

13
Hacker News — AI on Front Page community 5d ago

Anthropic says Alibaba illicitly extracted Claude AI model capabilities

Article URL: https://www.reuters.com/world/china/anthropic-says-alibaba-illicitly-extracted-claude-ai-model-capabilities-2026-06-24/ Comments URL: https://news.ycombinator.com/item?id=48664814 Points: 304 # Comments: 528

17
Hugging Face Daily Papers research 5d ago

InSight: Self-Guided Skill Acquisition via Steerable VLAs

Abstract InSight enables autonomous skill acquisition for vision-language-action models through primitive-action level steerability and automated demonstration generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-language-action (VLA) models can learn manipulation…

19
r/LocalLLaMA community 5d ago

I reverse engineered Windows Copilot into a free OpenAI compatible API (GPT-4, no API key, no billing)

So Microsoft gives you GPT-4 for free in Copilot. They just don't give you an API for it. So I made one. It logs into your own Microsoft account once, saves the session, and exposes a local server at http://localhost:8000/v1 that speaks the OpenAI format. Point the official…

24
Hacker News — AI on Front Page community 5d ago

Computer use in Gemini 3.5 Flash

Article URL: https://blog.google/innovation-and-ai/models-and-research/gemini-models/introducing-computer-use-gemini-3-5-flash/ Comments URL: https://news.ycombinator.com/item?id=48662999 Points: 204 # Comments: 129

24
TechCrunch — AI news-outlet 5d ago

Facebook rolls out an AI companion app for creators

The new app, which is currently being tested with select creators, will have Facebook's recently launched AI creator assistant built into it.

8
Google DeepMind official-blog 5d ago

Introducing computer use in Gemini 3.5 Flash

Introducing computer use in Gemini 3.5 Flash Jun 24, 2026 · Share x.com Facebook LinkedIn Mail Computer use is now a built-in tool in Gemini 3.5 Flash to build agents that can interact across platforms. Mateo Quiros Product Manager, Google DeepMind Share x.com Facebook LinkedIn…

9
r/MachineLearning community 5d ago

Find the best open-source OCR models in one place at Papers with Code [P]

Hi, I've created an overview of the most important OCR benchmarks, along with the top open models, and links to their paper and code: https://paperswithcode.co/tasks/ocr . This week, new OCR models were released by Baidu and Mistral. Baidu released Unlimited OCR , a 3B-parameter…

27
Hacker News — AI on Front Page community 5d ago

Qualcomm to Acquire Modular

https://investor.qualcomm.com/news-events/press-releases/new... https://www.modular.com/blog/qualcomm-to-acquire-modular https://x.com/clattner_llvm/status/2069769232477192354 , https://xcancel.com/clattner_llvm/status/2069769232477192354 Comments URL:…

5
Hugging Face Daily Papers research 5d ago

Semantic Browsing: Controllable Diversity for Image Generation

Abstract Text-to-image models are enhanced with controlled diversity through semantic browsing capabilities that enable structured navigation of image variations based on meaningful semantic decisions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern text-to-image models…

4
r/LocalLLaMA community 5d ago

The Bank of Korea just released a report about AI productivity

I am sorry for sharing an article from a Korean website that you might not be familiar with. But South Korea is the only country currently making a lot of money from the AI boom. BigTech in the USA are paying huge amounts of money to buy semiconductor chips from Samsung and SK…

27
r/LocalLLaMA community 5d ago

Qwen-AgentWorld-35B-A3B for Coding?

Benchmark from its model card. Removed online models & Qwen-AgentWorld-397B-A17B from the table. Just Open models. Model MCP Search Term. SWE Android Web OS Overall DeepSeek-V4-Pro 63.27 27.61 51.26 59.44 55.17 50.32 63.70 52.97 GLM-5.1 67.60 22.46 47.32 52.07 59.10 51.50 59.13…

11
Hugging Face Daily Papers research 5d ago

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

Abstract Large language models face challenges in archive-grounded reasoning tasks involving evidence retrieval and synthesis across diverse document collections, with performance varying significantly across domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language…

26
r/LocalLLaMA community 5d ago

How Baidu's newly released Unlimited-OCR transcribes dozens of pages in one forward pass

https://i.redd.it/zjduf8zns79h1.gif Baidu released Unlimited-OCR 2 days ago, and they claim it can transcribe dozens of pages in one forward pass. I read the research paper, and decided to make a post ( link if anyone's interested) Problem they are solving The problem it targets…

11
Hugging Face Daily Papers research 5d ago

ChartWalker: Benchmarking the Cross-Chart RAG Task

Abstract ChartWalker presents a novel framework for cross-chart retrieval-augmented generation with hierarchical knowledge graph construction and structure-aware sampling for challenging multi-modal analytical tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Cross-Chart…

33
r/LocalLLaMA community 5d ago

Qwen3.6 27B more dumb in vLLM compared to llama.cpp

Hello, I recently bought a new RTX 5060Ti to pair with the RTX 5060Ti I already own, now I have 32GB of VRAM. Up until now for convenience I've used llama.cpp, for goodness' sake it works excellently when only 1 user is using it, but now there are 2 of us using it and llama.cpp…

34
Hugging Face Daily Papers research 5d ago

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

Abstract QG-MIL introduces a gated transformer aggregator for multiple instance learning in medical imaging that stabilizes attention distribution and improves prediction consistency across different medical domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Attention-based…

38
r/MachineLearning community 5d ago

Could it be that there aren’t really any medical LLM APIs available right now? [D]

As part of my ablations, I want to generate text with a medical-oriented LLM, and I was surprised to find no exposed APIs for this kind of model. I found models like MedGemma and BioMistral on Hugging Face, but they don’t seem to offer public APIs, and I really don’t want to…

24
Hugging Face Daily Papers research 5d ago

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

Abstract EventVLA addresses long-horizon robotic manipulation challenges by introducing a sparse visual evidence memory framework with visual anchors and dynamic Keyframe Evidence Memory module for improved task performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory…

23
Latent.Space news-outlet 5d ago

[AINews] Claude Tag: Multiplayer, Proactive, Persistent Agents in Slack

Claude finally gets a Slackbot upgrade

8
Hugging Face Daily Papers research 6d ago

OpenThoughts-Agent: Data Recipes for Agentic Models

Abstract An open-source data curation pipeline for training agentic language models is presented, demonstrating superior performance through systematic experimentation and scalable training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic language models dramatically…

34
Hugging Face Daily Papers research 6d ago

FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

Abstract FLUX3D addresses limitations in image-to-3D Gaussian Splatting generation by improving representation learning and cross-modal alignment through specialized architectures and attention mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse voxel representation…

34
Hugging Face Daily Papers research 6d ago

World Value Models for Robotic Manipulation

Abstract World Value Model combines world models with value estimation to provide accurate task progression assessment and improve robotic policy learning from mixed-quality data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generalist value models play a pivotal role in scaling…

6
Hugging Face Daily Papers research 6d ago

LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis

Abstract A large-scale multi-agent benchmark for evaluating LLMs in Chinese psychiatric diagnosis is introduced, highlighting challenges in dynamic consultation and the gap between consultation quality and diagnostic accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Mental…

36
r/LocalLLaMA community 6d ago

Qwen-AgentWorld-397B-A17B

It looks like a new model, mentioned on https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B and on https://qwen.ai/blog?id=qwen-agentworld   submitted by   /u/Shoddy_Bed3240 [link]   [comments]

18
Hugging Face Daily Papers research 6d ago

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

Abstract Video diffusion models are adapted to decode explicit surface primitives directly from latent space, enabling high-quality 3D scene generation with improved geometric accuracy and real-time rendering capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generating…

26
r/LocalLLaMA community 6d ago

Unlimited-OCR is now on ModelScope! A 3.3B multilingual OCR model for one-shot parsing across single images, multi-page documents, and PDFs. License: MIT

Full-document parsing instead of cropped-region OCR 32K output length for long OCR sequences Base and gundam image modes for different document layouts Transformers inference + SGLang serving with OpenAI-compatible streaming requests Built to push DeepSeek-OCR-style document…

22
r/LocalLLaMA community 6d ago

Qwen-AgentWorld-35B-A3B: a 3B-active MoE trained to simulate MCP, terminal, SWE, Android, web and OS environments

Qwen just released Qwen-AgentWorld-35B-A3B — a 35B-parameter MoE with only ~3B active parameters per token. The interesting part: this is not positioned as a standard chat/instruction model or a full autonomous agent. It is a language world model trained to predict what an…

6
r/LocalLLaMA community 6d ago

GitHub - QwenLM/Qwen-AgentWorld: Qwen-AgentWorld: Language World Models for General Agents

  submitted by   /u/dan945 [link]   [comments]

5
arXiv — Machine Learning research 6d ago

You Don't Need to Run Every Eval

arXiv:2606.24020v1 Announce Type: new Abstract: A modern model release reports scores on 40+ benchmarks and the same evaluations were run many more times before it: to track training progress, compare design choices, and select the checkpoint for the release. But do we need to…

29
arXiv — NLP / Computation & Language research 6d ago

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents

arXiv:2606.23937v1 Announce Type: new Abstract: Exact-match retrieval recall is often used as a proxy for whether a retriever supplies useful policy context to a downstream decision model. We test this proxy for pre-action policy classification in tau-bench using Qwen2.5-3B/7B…

11

Training Dynamics of Neural Software Defect Predictors under Coupled Data-Quality Issues

Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations

Real-Time Voice AI Hears but Does Not Listen

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

[NEW MODEL] SupraWeather-Nano-Preview Just released!

Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do

DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation

RoPE-Aware Bit Allocation for KV-Cache Quantization

Cloudflare launched self-managed OAuth for all

For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)

IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)!

Anybody used DwarfStar with DeepSeek V4 Flash on 1x DGX Spark yet? What are your thoughts?

Deep Agents and OpenCode are now available in the AI SDK Harness

simonw/browser-compat-db

Critique of Agent Model

Do cloud chatbot's system prompts make them stupider?

Anthropic says Alibaba illicitly extracted Claude AI model capabilities

InSight: Self-Guided Skill Acquisition via Steerable VLAs

I reverse engineered Windows Copilot into a free OpenAI compatible API (GPT-4, no API key, no billing)

Computer use in Gemini 3.5 Flash

Facebook rolls out an AI companion app for creators

Introducing computer use in Gemini 3.5 Flash

Find the best open-source OCR models in one place at Papers with Code [P]

Qualcomm to Acquire Modular

Semantic Browsing: Controllable Diversity for Image Generation

The Bank of Korea just released a report about AI productivity

Qwen-AgentWorld-35B-A3B for Coding?

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

How Baidu's newly released Unlimited-OCR transcribes dozens of pages in one forward pass

ChartWalker: Benchmarking the Cross-Chart RAG Task

Qwen3.6 27B more dumb in vLLM compared to llama.cpp

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

Could it be that there aren’t really any medical LLM APIs available right now? [D]

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

[AINews] Claude Tag: Multiplayer, Proactive, Persistent Agents in Slack

OpenThoughts-Agent: Data Recipes for Agentic Models

FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

World Value Models for Robotic Manipulation

LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis

Qwen-AgentWorld-397B-A17B

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

Unlimited-OCR is now on ModelScope! A 3.3B multilingual OCR model for one-shot parsing across single images, multi-page documents, and PDFs. License: MIT

Qwen-AgentWorld-35B-A3B: a 3B-active MoE trained to simulate MCP, terminal, SWE, Android, web and OS environments

GitHub - QwenLM/Qwen-AgentWorld: Qwen-AgentWorld: Language World Models for General Agents

You Don't Need to Run Every Eval

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents