Tag

Model releases

500 articles archived under #model-release · RSS

Hugging Face Daily Papers research 7d ago

MeshFlow: Mesh Generation with Equivariant Flow Matching

Abstract MeshFlow generates triangle meshes directly using equivariant optimal-transport flow matching models with improved inference speed over autoregressive methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Meshes are among the most common 3D scene representations, but…

16
Hugging Face Daily Papers research 7d ago

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Abstract HAKARI-Bench provides a lightweight benchmark for comparing retrieval methods across multiple configurations and languages, enabling efficient model selection and performance analysis. Generated by Qwen/Qwen2.5-Coder-32B-Instruct With the rapid spread of…

23
Hugging Face Daily Papers research 7d ago

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Abstract AOHP presents an Android-based operating system framework that treats AI agents as first-class entities, enhancing task completion rates and reducing execution costs through specialized agent-oriented mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI agents…

16
r/LocalLLaMA community 7d ago

Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

To train Qwen 3.5 4B or 9B for a custom multi-tool agent workflow and would appreciate guidance from people who have done this successfully. A few questions: SFT → RL or RL-only? - Is it still recommended to first do supervised fine-tuning (tool-calling traces, reasoning…

15
Hugging Face Daily Papers research 7d ago

DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams

Abstract Agentic Data Tailoring paradigm uses learnable data processing to structure high-entropy multimodal streams, with DataClaw_0-9B model achieving robust alignment through SFT and GRPO on a novel benchmark. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Massive unstructured…

19
Smol AI News news-outlet 7d ago

not much happened today

**Prime Intellect's `prime-rl` v0.6.0** advances agentic reinforcement learning infrastructure supporting **1 trillion parameter MoE models** with sub-5-minute step times and a **131k context GLM-5 agentic setup**. The release includes optimizations in inference, training, and…

37
r/LocalLLaMA community 7d ago

Is there any reason for a lack of love for Gemma 4 26b?

The answer to most questions on here is Qwen3.6 27b or 35b and then Gemma4 31b (but lesser so as it doesn’t fit well on a solo 3090). Is there any reason why Gemma 4 26b moe isn’t mentioned more? I plan on using Qwen for my coding agents. But I’ve been building a Jarvis for…

20
Hugging Face Daily Papers research 7d ago

UniverSat: Resolution- and Modality-Agnostic Transformers for Earth Observation

Abstract UniverSat introduces a Universal Patch Encoder for Vision Transformers that enables robust, sensor-agnostic spatial feature extraction across diverse Earth Observation data types. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision Transformers (ViT) dominate computer…

6
Hugging Face Daily Papers research 7d ago

FastMix: Fast Data Mixture Optimization via Gradient Descent

Abstract FASTMIX automates optimal data mixture discovery during training by formulating mixture selection as a bilevel optimization problem that jointly optimizes mixture coefficients and model parameters through iterative updates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

19
Hugging Face Daily Papers research 7d ago

PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning

Abstract PoLAR introduces a geometrically structured latent action representation in hyperbolic space that separates transition extent from transition mode, improving robotic policy learning performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Latent action pretraining…

12
Hugging Face Daily Papers research 7d ago

Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views

Abstract DR-MV3D presents a map-grounded learning framework with dense rewards to improve multi-view 3D visual question answering through global map construction, view-trajectory planning, and egocentric grounding. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-view 3D…

15
Hugging Face Daily Papers research 7d ago

EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions

Abstract EnterpriseClawBench presents a benchmark for enterprise agents based on real-world sessions with 852 reproducible tasks, emphasizing comprehensive evaluation metrics beyond single performance scores. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Enterprise agents…

30
LangChain releases dev-tools 7d ago

langchain-openrouter==0.2.4

Changes since langchain-openrouter==0.2.3 release(openrouter): 0.2.4 ( #38381 ) chore(openrouter): bump openrouter floor to 0.9.2, drop file workaround ( #38216 ) test(openrouter): cover cache_control passthrough on tool defs ( #38215 ) feat(openrouter): surface…

22
Hugging Face Daily Papers research 7d ago

Safe Few-Step Generation via Velocity Editing

Abstract VESFlow is a training-free safety method for flow matching-based text-to-image generation that edits velocity fields to ensure safe output while maintaining prompt integrity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Flow matching has recently emerged as a strong…

16
r/LocalLLaMA community 7d ago

100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+

Wanted to share a setup that's been working great for me. Running Qwen3.6-27B at Q8_0 across two GPUs (RTX 5090 + RTX 3090 Ti) and getting ~100 t/s. The big jump came from switching --split-mode to tensor . I was sitting at 70+ t/s on layer split before that. Tensor split keeps…

22
Hugging Face Daily Papers research 7d ago

Tmax: A simple recipe for terminal agents

Abstract A novel RL training approach for terminal agents achieves superior performance using a simplified recipe and expanded dataset, enabling effective training with fewer parameters than previous methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Terminal-using agents…

36
Hugging Face Daily Papers research 7d ago

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

Abstract Large language models can be trained through reinforcement learning to develop a meta-capability enabling continuous learning and adaptation across long sequences of tasks in dynamic environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This work presents a general…

31
Hugging Face Daily Papers research 7d ago

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Abstract PlanBench-XL evaluates large language model agents' ability to plan and adapt in complex tool-rich environments with limited visibility and dynamic disruptions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents increasingly operate in large tool ecosystems, where…

10
Hugging Face Daily Papers research 7d ago

World Action Models: A Survey

Abstract World Action Models are predictive-action systems that generate future states for decision-making, with designs balancing representational richness against computational constraints. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World Action Models (WAMs) are embodied…

30
Hugging Face Daily Papers research 7d ago

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

Abstract KaLM-Reranker-V1 is a fast reranker that decouples query and passage computation using encoder-decoder architecture with Matryoshka embedding pooling and cross-attention for efficient relevance modeling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As retrieval systems…

32
Hugging Face Daily Papers research 7d ago

EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory

Abstract EvoEmbedding is a dynamic embedding model that generates adaptive representations by maintaining a continuously updated latent memory, enabling improved retrieval performance in long-context scenarios. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing embedding…

32
Hugging Face Daily Papers research 7d ago

Exploring the Design Space of Reward Backpropagation for Flow Matching

Abstract FlowBP addresses limitations in flow matching model alignment by using a surrogate trajectory framework that reduces memory usage and gradient chaining while maintaining performance across multiple text-to-image models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

23
Hugging Face Daily Papers research 7d ago

DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks

Abstract Search agents face challenges in real-world evaluation due to limited benchmarks and coarse metrics, necessitating more nuanced assessment approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Search Agents (SAs) typically leverage large language models (LLMs) to…

14
Hugging Face Daily Papers research 7d ago

CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks

Abstract Calibrated verifier telemetry enhances LLM agents in knowledge-intensive question answering by providing confidence scores and grounding verification, reducing both over-retrieval and unsupported answers. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents in…

7
Hacker News — AI on Front Page community 7d ago

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

Article URL: https://arxiv.org/abs/2606.16140 Comments URL: https://news.ycombinator.com/item?id=48639240 Points: 211 # Comments: 85

26
TechCrunch — AI news-outlet 7d ago

OpenAI launches new initiative to help find and patch open-source bugs

OpenAI is attempting to tackle the security issues of the open source software community.

25
Vercel — AI dev-tools 7d ago

Deploy from Claude Design to Vercel

Vercel is now a send-to destination in Claude Design . When you finish a design, you can send it to Vercel and get a live URL back without leaving your canvas. Claude Design deploys the design as a new project in your connected Vercel account and returns a URL you can open and…

22
Simon Willison community 7d ago

Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code

This morning on Hacker News I saw Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance , describing a small but effective inpainting model - a model where you can mark regions of an image to remove and the model imagines what should fill the space. The…

13
LangChain releases dev-tools 7d ago

langchain==1.3.11

Changes since langchain==1.3.10 release(langchain): 1.3.11 ( #38377 ) fix(langchain,openai): only set strict=True on tools for OpenAI-compatible models in ProviderStrategy ( #38370 ) chore: bump pydantic-settings from 2.12.0 to 2.14.2 in /libs/langchain_v1 ( #38279 ) chore: bump…

18
LangChain releases dev-tools 7d ago

langchain-anthropic==1.4.7

Changes since langchain-anthropic==1.4.6 hotfix(anthropic): regenerate cassette ( #38376 ) release(anthropic): 1.4.7 ( #38373 ) chore: bump vcrpy from 8.1.1 to 8.2.1 in /libs/partners/anthropic ( #38324 ) chore: bump langsmith from 0.8.5 to 0.8.18 in /libs/partners/anthropic (…

24
LangChain releases dev-tools 7d ago

langchain-openai==1.3.3

Changes since langchain-openai==1.3.2 release(openai): 1.3.3 ( #38375 ) fix(openai): drop response item ids when store is false ( #38372 ) fix(langchain,openai): only set strict=True on tools for OpenAI-compatible models in ProviderStrategy ( #38370 ) test(openai): clarify…

15
r/LocalLLaMA community 7d ago

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

https://eqbench.com/creative_writing.html#:~:text=gemma%2D4%2D31B,Sample From what I've seen Gemma 4 has better everything (especially long-context adherence) EXCEPT for the raw prosing performance of Mistral... finetunes . Comparing bases only, Mistral Small 3.2 (the…

5
Hugging Face Daily Papers research 7d ago

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Abstract Adaptive Binning introduces a training-adaptive discretization method for self-supervised learning on medical tabular data, improving representation learning through feature-wise refinement and heterogeneous feature handling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

13
r/LocalLLaMA community 7d ago

DeepSeek raises $7.4B USD at $60B valuation. Remarkably, Liang Wenfeng invests $3B in DeepSeek himself.

  submitted by   /u/FullOf_Bad_Ideas [link]   [comments]

35
Hugging Face Daily Papers research 7d ago

Characterizing Narrative Content in Web-scale LLM Pretraining Data

Abstract A comprehensive analysis of narrative structures in large-scale language model training data reveals measurable, multidimensional narrative patterns that vary across different content sources and topics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The narrative…

21
r/LocalLLaMA community 7d ago

GLM-5.2 vs Claude Opus

  submitted by   /u/johnnyApplePRNG [link]   [comments]

28
Hugging Face Daily Papers research 7d ago

MCompassRAG: Topic Metadata as a Semantic Compass for Paragraph-Level Retrieval

Abstract MCompassRAG enhances retrieval-augmented generation by using topic-level metadata to guide chunk selection, improving both efficiency and precision in complex research tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Retrieval-augmented generation (RAG) systems…

32
Hacker News — AI on Front Page community 7d ago

Steam Machine launches today

https://store.steampowered.com/sale/steammachine https://www.lttlabs.com/articles/2026/06/22/the-newell-nucle... https://www.youtube.com/watch?v=66QzlDewigE Comments URL: https://news.ycombinator.com/item?id=48632884 Points: 905 # Comments: 784

33
r/LocalLLaMA community 7d ago

TMax: A Simple Recipe for Terminal Agents

TMax is the strongest open RL recipe for terminal agents to date, bringing open data recipes closer to the frontier. We release two things. The first is TMax-15k , a dataset of 14,600 RL environments built from a compositional pipeline with explicit control over difficulty and…

22
r/LocalLLaMA community 7d ago

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

I have been testing all popular MoE for my Mac and it seems I just found gold: 3.5/3.6 level of reasoning (if not slightly superior) at a fraction of the reasoning tokens used (wasted). Dynamic plot with other benchmarks here: https://benchmark-yourself.streamlit.app/…

4
Hugging Face Daily Papers research 7d ago

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Abstract Multimodal large language models exhibit social bias driven by specific visual attributes, with fashion style and socioeconomic cues having the greatest impact on model judgments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) are…

37
r/LocalLLaMA community 7d ago

Same model, same prompt, 4 different agents

Setup: one self-hosted Qwen3.6-27B (Q4) on llama.cpp, identical prompt, identical hardware. The only variable is the agent scaffolding. Agents tested: pi, opencode, hermes, qwen code . Task: a single-file 2D canvas solar system with scripted orbits and gravity that acts only on…

14
Hacker News — AI on Front Page community 7d ago

The text in Claude Code’s “Extended Thinking” output

Article URL: https://patrickmccanna.net/the-text-in-claude-codes-extended-thinking-output-is-not-authentic/ Comments URL: https://news.ycombinator.com/item?id=48630535 Points: 210 # Comments: 151

19
r/LocalLLaMA community 7d ago

Qwen3.6-35B-A3B APEX on a Single RTX 3090 - Getting the Most Out of It

Resources I used: - https://github.com/ikawrakow/ik_llama.cpp - as the reference llama.cpp fork - https://github.com/spiritbuun/buun-llama-cpp - to test the TurboQuant feature - https://huggingface.co/mudler - for the models - https://github.com/noonghunna/club-3090 - for speed…

19
r/LocalLLaMA community 7d ago

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

arXiv : https://arxiv.org/abs/2606.15079 Full Paper : https://arxiv.org/pdf/2606.15079 HuggingFace : https://huggingface.co/inclusionAI/models?sort=created (This month they released base models for both Ling-2.6-1T & Ling-2.6-flash ) -------------------------- Wish they released…

11
r/LocalLLaMA community 7d ago

European inference providers for GLM 5.2, DeepSeek V4 Flash?

So I am using Openrouter and I see that for GLM 5.2 it lists 16 providers. Most of them in the US, 1 or 2 in Singapore or China. Are there seriously no European inference providers for open-weight models? (No I don't mean Mistral, I mean a provider running especially the Chinese…

12
OpenAI official-blog 7d ago

Daybreak: Tools for securing every organization in the world

OpenAI introduces new Daybreak tools, including Codex Security and GPT-5.5-Cyber, to help organizations find, validate, and patch vulnerabilities at scale.

31
Hugging Face Daily Papers research 8d ago

Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

Abstract Reflective Masking enables iterative local refinement in Mask Diffusion Models through lightweight post-training, supporting multi-turn reasoning without architectural changes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While reasoning on autoregressive (AR) models is…

26
Hugging Face Daily Papers research 8d ago

GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning

Abstract GeneralVLA-2 addresses limitations in vision-language-action systems by introducing GeoFuse-MV3D for improved 3D reconstruction and an enhanced KnowledgeBank for better memory management in robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

32
r/LocalLLaMA community 8d ago

Do you think dedicated hardware for running local LLMs will become affordable anytime soon?

Models like qwen 27b dense have already proved to be useful coding/general purpose assistants, but issue is still with hardware even the entry level hardware is relatively expensive, would we be getting hardware specifically built for inference for consumers at affordable price…

6

MeshFlow: Mesh Generation with Equivariant Flow Matching

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams

not much happened today

Is there any reason for a lack of love for Gemma 4 26b?

UniverSat: Resolution- and Modality-Agnostic Transformers for Earth Observation

FastMix: Fast Data Mixture Optimization via Gradient Descent

PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning

Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views

EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions

langchain-openrouter==0.2.4

Safe Few-Step Generation via Velocity Editing

100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+

Tmax: A simple recipe for terminal agents

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

World Action Models: A Survey

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory

Exploring the Design Space of Reward Backpropagation for Flow Matching

DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks

CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

OpenAI launches new initiative to help find and patch open-source bugs

Deploy from Claude Design to Vercel

Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code

langchain==1.3.11

langchain-anthropic==1.4.7

langchain-openai==1.3.3

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

DeepSeek raises $7.4B USD at $60B valuation. Remarkably, Liang Wenfeng invests $3B in DeepSeek himself.

Characterizing Narrative Content in Web-scale LLM Pretraining Data

GLM-5.2 vs Claude Opus

MCompassRAG: Topic Metadata as a Semantic Compass for Paragraph-Level Retrieval

Steam Machine launches today

TMax: A Simple Recipe for Terminal Agents

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Same model, same prompt, 4 different agents

The text in Claude Code’s “Extended Thinking” output

Qwen3.6-35B-A3B APEX on a Single RTX 3090 - Getting the Most Out of It

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

European inference providers for GLM 5.2, DeepSeek V4 Flash?

Daybreak: Tools for securing every organization in the world

Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning

Do you think dedicated hardware for running local LLMs will become affordable anytime soon?