Tag

Model releases

500 articles archived under #model-release · RSS

r/LocalLLaMA community 8d ago

Do you think dedicated hardware for running local LLMs will become affordable anytime soon?

Models like qwen 27b dense have already proved to be useful coding/general purpose assistants, but issue is still with hardware even the entry level hardware is relatively expensive, would we be getting hardware specifically built for inference for consumers at affordable price…

6
Hacker News — AI on Front Page community 8d ago

GLM 5.2 vs. Opus

Article URL: https://techstackups.com/comparisons/glm-5.2-vs-opus/ Comments URL: https://news.ycombinator.com/item?id=48626866 Points: 205 # Comments: 164

24
Smol AI News news-outlet 8d ago

not much happened today

**OpenAI** expanded its **Daybreak** program with the **GPT-5.5-Cyber** model, focusing on closed-loop patch generation for cybersecurity, scanning over 30 million commits and covering major projects like cURL and Python. The release sparked debate on policy and export controls,…

36
Hugging Face Daily Papers research 8d ago

Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations

Abstract A novel approach for B2B conversation classification that reduces token usage by 99% while improving performance and maintaining robustness as context length increases. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In-context learning (ICL) is the standard method for…

8
Hugging Face Daily Papers research 8d ago

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Abstract Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory benchmarks for…

5
Hugging Face Daily Papers research 8d ago

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

Abstract A 3D brain MRI generative model uses a masked-autoencoder tokenizer to create clinically informative embeddings that support both medical task performance and controlled image generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Three-dimensional (3D) brain MRI is…

6
Hugging Face Daily Papers research 8d ago

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Abstract WorldLines benchmark evaluates long-term memory in embodied agents through household scenarios, while ObsMem framework addresses challenges in partial observability and memory translation for decision-making. Generated by Qwen/Qwen2.5-Coder-32B-Instruct To assist humans…

19
r/LocalLLaMA community 8d ago

Agent recommendations

Hi, I have a Strix Halo with 128GB setup that runs a couple of models (GPT-OSS 120b, Qwen3.5-122b, Gemma-4-31b) on llama-swap. GPT and Qwen run quite fast at 40-50T/s, while Gemma is a slow 4-5T/s but seems to have the best quality. I'd like to vibe code a personal Webproject in…

17
Vercel — AI dev-tools 8d ago

Sakana Fugu Ultra now available on AI Gateway

Sakana Fugu Ultra from Sakana AI is now available on AI Gateway . Fugu Ultra is built on a pool of publicly accessible frontier models, rather than running as a single model. It coordinates several models, routing work to 1-3 agents depending on the problem and combining their…

31
Simon Willison community 8d ago

sqlite-utils 4.0rc1

Release: sqlite-utils 4.0rc1 See sqlite-utils 4.0rc1 adds migrations and nested transactions . Tags: sqlite-utils

30
r/MachineLearning community 8d ago

[ECCV 2026] Paper Decision Appeals Discussion [D]

With the release of meta-reviews, ECCV sent out a google form for dissatisfied authors to submit an appeal for the following reasons: Policy errors, e.g., reviewers or Area Chairs applied a policy that does not exist, or reviewers or Area Chairs applied policies that are not…

18
r/LocalLLaMA community 8d ago

A100 slow Qwen3.6-27B-FP8

Setting up a server for someone who has an A100 80GB, even though this doesn't natively support FP8 does 43tps decode sound too low for single request? For comparison the exact same vllm config on my RTX 6000 PRO runs the same single request test at 130tps. For 8 concurrent…

11
r/LocalLLaMA community 8d ago

Qwen 27B for planning, Qwen 35B-A3B for execution?

My 32GB unified memory setup runs both, though 27B even with MTP is something like 7-10 tok/sec. Usable but not real time by any means. (~18 tok/sec with 35B-A3B) Would it be worth using 27B to plan long horizon tasks, put together the PLAN.md, and have 35B-A4B iterate over it…

14
r/LocalLLaMA community 8d ago

Qwen 3.6 27b Abliterated (apostate)

I've been working on a project called Apostate and have finally released my first large model with it on Hugging Face. Qwen 3.6 27B with safety alignment removed down from 92% to 7.6% refusal rate with minimal impact on the model's capabilities (0.120 KL). Qwen 3.6 27B Apostate…

17
r/LocalLLaMA community 8d ago

I pretrained and post trained a 500M parameter LLM and 330M parameter Image generator from scratch

Hey folks Hope you are doing well I started HobbyLM as an side project last month Initially I wrote an Agent harness using Claude SDK which takes notes on various LLM architecture does ablation studies to find optimised or well fit architecture for this model training then I…

16
r/LocalLLaMA community 8d ago

What‘s your local „Haiku“-Replacement?

Seriously looking for a reliable and fast local Haiku replacement. Basically it should be able to summarize technical stuff, code documentation, architectural descriptions Any suggestions? Edit: sorry, totally forgot that my local machine is a M4 Max 128GB. But at the same time…

6
r/LocalLLaMA community 8d ago

2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp

There isn't much information around about multi-GPU setups with the R9700, so I'm writing this up in case it helps anyone in the same situation. Here's my setup, the tests I ran, and the numbers from the server logs. Setup ThinkStation P7, Xeon w7-3455, 128 GB RDIMM 2× Gigabyte…

11
r/LocalLLaMA community 8d ago

ROCm vs Vulkan vs vLLM on Dual R9700's

Just wanted to share these numbers I saw running Qwen3.6 35BA3 and Qwen3.6 27B and the big increase I saw going to vLLM. I was just expecting better concurrency but ended up with a lot better speeds. llama.cpp services Running ROCm and Vulkan Model Backend Gen 35B-A3B Q6_K_XL…

19
Hacker News — AI on Front Page community 8d ago

Identity verification on Claude

https://old.reddit.com/r/ClaudeAI/comments/1ubm53n/official_... Comments URL: https://news.ycombinator.com/item?id=48618455 Points: 289 # Comments: 239

38
r/LocalLLaMA community 8d ago

8-16 MI50s Minimax M3 @19 tps TG (peak)

TL;DR Speeds are not too ugly for this old 2018 hardware but imo, not very usable for agentic coding (if you compare with qwen3.6 27B on 8 MI50 @ 50 tps TG 800 tps PP). More concerning is that the reasoning output is very very long and still didn’t check about the quality of…

27
r/LocalLLaMA community 8d ago

Claude Will Soon Require Identity Verification

https://support.claude.com/en/articles/14328960-identity-verification-on-claude   submitted by   /u/Few_Painter_5588 [link]   [comments]

11
r/MachineLearning community 8d ago

I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]

  submitted by   /u/NonGameCatharsis [link]   [comments]

29
r/LocalLLaMA community 8d ago

Why is AutoRound being slept on so hard?

Seriously, why is almost nobody talking about AutoRound here? I’ve been experimenting with it on Qwen3.6 27B lately (running an AMD setup), and the perplexity/accuracy retention at low bits absolutely blows standard AWQ or RTN out of the water. Especially for models with complex…

6
r/LocalLLaMA community 9d ago

GLM-5.2 benchmarked on DeepSWE: Beats Gemini & GPT-5.4, but the token volume/cost makes it wildly inefficient? (Theo - t3.gg)

Saw this breakdown from Theo (t3.gg) on X showing the latest DeepSWE leaderboard stats for the new GLM-5.2 open-weight model.The good news: it's officially surpassing GPT-5.4 and the entire Gemini lineup in raw coding capability. Seeing an open-weight model punch that high is…

15
r/LocalLLaMA community 9d ago

Qwen is never going to open source Qwen 3.7, aren't they?

Well, this was predictable. After Qwen fired Junyang Lin, the next models are no longer open source. Labs that have released open source models more recently than Qwen: GLM-5.2, 2026-06-17 Kimi-K2.7-Code, 2026-06-12 MiniMax-M3, 2026-06-11 Step-3.7-Flash, 2026-05-29…

15
r/LocalLLaMA community 9d ago

AllenAI releases MolmoMotion vision models for predicting future motion based on short frame history

AllenAI just released two models in the MolmoMotion family: https://huggingface.co/allenai/MolmoMotion-4B-H3-F30 https://huggingface.co/allenai/MolmoMotion-4B-H1-F32 MolmoMotion is a 4B vision-language model that forecasts 3D point trajectories under natural-language action…

30
Hacker News — AI on Front Page community 9d ago

The brain was not designed for this much bad news

Article URL: https://www.sciencedaily.com/releases/2026/06/260614012006.htm Comments URL: https://news.ycombinator.com/item?id=48615569 Points: 314 # Comments: 266

28
r/LocalLLaMA community 9d ago

It’s time to decentralize model distribution! Introducing Noema Atlas

TL;DR: Noema Atlas is a peer-to-peer network software using Iroh for local LLM weights, free and open source (Apache-2.0). Models come from whichever peers have them, with Hugging Face and mirrors as fallback (opt-in). Every file is identified by its content hash and a signed…

38
r/LocalLLaMA community 9d ago

Qwen code companion on vscode marketplace - thoughts

I just came across this extension in vscode few days ago and tried to use with LM studio hosted models and it really is pretty good compared to `continue`, `kilo`, `cline`, `roo` like I felt without much tweaks, gets straight to the point, if any tweaks required u could do…

36
r/LocalLLaMA community 9d ago

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!

I know gemma 4 26b is (according to this sub) a bit behind for coding tasks but for language learning and scientific (health/biology/medical/clinical/biochem) queries it’s unbeaten even by Qwen 3.5/3.6. Since the competition in the small MOE models is generally between Qwen…

28
llama.cpp releases dev-tools 9d ago

b9741

llama : use LLM_KV for quantization_version & file_type ( #24802 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

27
llama.cpp releases dev-tools 9d ago

b9739

release: add missing link for win opencl adreno arm64 ( #24809 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

33
r/LocalLLaMA community 9d ago

Any opinion about Qwen3.6-27B@BF16 vs Step3.7@IQ4_XS?

Obviously one is dense, slower but at full precision, and the other is MoE, 7x more params, less than ideal quant, and will eat up more memory, but my question is: Which would make saner decisions with less hand-holding, ie which is genuinely smarter? I've been using…

34
Hacker News — AI on Front Page community 9d ago

Show HN: TownSquare, a tiny presence layer for websites

https://cauenapier.com/blog/townsquare_release/ https://cauenapier.com/blog/townsquare/ Comments URL: https://news.ycombinator.com/item?id=48608570 Points: 209 # Comments: 116

14
llama.cpp releases dev-tools 9d ago

b9737

docker : prebuild web UI for s390x build [no release] ( #24829 )

31
r/LocalLLaMA community 10d ago

Best Settings for 48GB VRAM + Qwen 3.6 27B

Hey everyone, I've been running Qwen3.6 27B (Q8_0) across an RTX 4090 + RTX 3090 setup using llama.cpp with tensor split, and I wanted to share what's been working best for me so far. See if anyone has any better settings Hardware: RTX 4090 (24GB) + RTX 3090 (24GB), 48GB VRAM…

4
r/LocalLLaMA community 10d ago

7900XTX 24GB vram, can finally fit Q6K+MTP with Qwen 3.6 27B at 131k context

OS: CatchyOS Instructions: Connect monitor to iGPU directly so when you boot Linux your dGPU vram is 100% free since by default when you use your dGPU it consumes about 700mb~1.2gb of lost context space, yes you can still game normally using this approach. Setup kvcache at…

30
r/LocalLLaMA community 10d ago

Tool calling, opencode qwen3.6 27b 8K

Not sure I'm ready to post an issue in the opencode repo yet but wanted to see if this is common, return to the opencode window after walking away to let it do its thing to find its stopped with this in its thinking.. Started noticing more last week or so, the fix is easy just…

10
r/LocalLLaMA community 10d ago

Local agent on 4090 - looking for LM Studio settings

I have moved on from Ollama to just dink around and instead want to start running a local agent from time to time. With the 24GB of a 4090 (Gigabyte OC edition) that should be quite possible. But no matter what settings I use for context and batching, token generation is slow as…

36
r/LocalLLaMA community 10d ago

[NEW MODEL] SupraLabs just released supra-title-FFT-preview, 115K samples, almost 10x our first chat title dataset

Hey r/LocalLLaMA ! Following up on Supra-Title-350M-exp (our first chat title generation model), we're releasing supra-title-FFT-preview , trained on a much larger and cleaner dataset. 🤗 supra-title-FFT-preview What changed Our first chat title model was trained on 12K samples…

17
r/LocalLLaMA community 10d ago

$1800 (in GPU cost running with P2P running Qwen/Qwen3.6-27b-FP8 with 262K context and BF16 KV cache at 55 tok/s

Hey peeps, wanted to share what is possible for folks with an inference only single user use case with 1700 in GPU cost. Setup: 4x 5060 ti (16GB) with P2P If you are in the US and you keep an eye on facebook marketplace and places like slickdeals you can find some 5060 ti 16 GB…

30
r/LocalLLaMA community 10d ago

Maximizing performance of 2x3090 + NVLink

Hey all, I have built myself a decent rig with the following specs: - Ubuntu 24.04 - 2x3090 founder’s with NVLink - Ryzen 7950x3d - 64GB DDR5 I am currently routing my display through an eGPU to maximize available VRAM. My current go-to is Qwen 3.6 27B Q8_0 with MTP and…

6
Hugging Face Daily Papers research 10d ago

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

Abstract LEDGERAGENT is a method for customer service agents that maintains task states in a separate ledger to improve policy adherence and state management during tool calling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Policy-adherent tool-calling agents in customer-service…

36
Hugging Face Daily Papers research 10d ago

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

Abstract PerceptionDLM enables efficient parallel region perception in multimodal diffusion language models through structured attention masking and efficient prompting, achieving faster inference without sacrificing caption quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

12
r/LocalLLaMA community 10d ago

GLM-5.2-REAP50-GGUF

Has anybody tried these? How do they compare to Qwen 3.6 27b? Model Size Link GLM-5.2-REAP50-Q3_K_M-GGUF 182 GB https://huggingface.co/pipenetwork/GLM-5.2-REAP50-Q3_K_M-GGUF GLM-5.2-REAP50-Q2_K-GGUF 139 GB https://huggingface.co/pipenetwork/GLM-5.2-REAP50-Q2_K-GGUF  …

4
Hacker News — AI on Front Page community 10d ago

GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2

Article URL: https://arrowtsx.dev/bigger-models/ Comments URL: https://news.ycombinator.com/item?id=48600167 Points: 276 # Comments: 109

23
TechCrunch — AI news-outlet 10d ago

The US banned Anthropic’s Fable 5 release, but the numbers don’t seem to care

Just as last week was ending, the US government forced Anthropic to pull its two newest models, Fable 5 and Mythos 5, citing national security concerns after Amazon researchers allegedly found a way to bypass Fable 5’s guardrails.  Cybersecurity…

24
Hugging Face Daily Papers research 10d ago

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Abstract A comprehensive corpus and access layer for U.S. local ordinance codes has been developed to enable machine-readable legal AI research, addressing the lack of authoritative legal text at scale for local regulations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress…

4
r/LocalLLaMA community 10d ago

The economics of AI are starting to favor open models

For the last couple of years, the assumption was pretty simple: Want the smartest model? Pay for a closed API. Want something cheaper? Accept a capability hit. Looking at recent model releases, that tradeoff is starting to break down. The most interesting part of the chart isn't…

15
Hugging Face Daily Papers research 10d ago

ReSyn: A Generalized Recursive Regular Expression Synthesis Framework

Abstract A divide-and-conquer framework named ReSyn enhances regex synthesis accuracy by decomposing complex problems, combined with a parameter-efficient synthesizer called Set2Regex that handles example permutation invariance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

27

Do you think dedicated hardware for running local LLMs will become affordable anytime soon?

GLM 5.2 vs. Opus

not much happened today

Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Agent recommendations

Sakana Fugu Ultra now available on AI Gateway

sqlite-utils 4.0rc1

[ECCV 2026] Paper Decision Appeals Discussion [D]

A100 slow Qwen3.6-27B-FP8

Qwen 27B for planning, Qwen 35B-A3B for execution?

Qwen 3.6 27b Abliterated (apostate)

I pretrained and post trained a 500M parameter LLM and 330M parameter Image generator from scratch

What‘s your local „Haiku“-Replacement?

2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp

ROCm vs Vulkan vs vLLM on Dual R9700's

Identity verification on Claude

8-16 MI50s Minimax M3 @19 tps TG (peak)

Claude Will Soon Require Identity Verification

I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]

Why is AutoRound being slept on so hard?

GLM-5.2 benchmarked on DeepSWE: Beats Gemini & GPT-5.4, but the token volume/cost makes it wildly inefficient? (Theo - t3.gg)

Qwen is never going to open source Qwen 3.7, aren't they?

AllenAI releases MolmoMotion vision models for predicting future motion based on short frame history

The brain was not designed for this much bad news

It’s time to decentralize model distribution! Introducing Noema Atlas

Qwen code companion on vscode marketplace - thoughts

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!

b9741

b9739

Any opinion about Qwen3.6-27B@BF16 vs Step3.7@IQ4_XS?

Show HN: TownSquare, a tiny presence layer for websites

b9737

Best Settings for 48GB VRAM + Qwen 3.6 27B

7900XTX 24GB vram, can finally fit Q6K+MTP with Qwen 3.6 27B at 131k context

Tool calling, opencode qwen3.6 27b 8K

Local agent on 4090 - looking for LM Studio settings

[NEW MODEL] SupraLabs just released supra-title-FFT-preview, 115K samples, almost 10x our first chat title dataset

$1800 (in GPU cost running with P2P running Qwen/Qwen3.6-27b-FP8 with 262K context and BF16 KV cache at 55 tok/s

Maximizing performance of 2x3090 + NVLink

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

GLM-5.2-REAP50-GGUF

GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2

The US banned Anthropic&#8217;s Fable 5 release, but the numbers don&#8217;t seem to care

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

The economics of AI are starting to favor open models

ReSyn: A Generalized Recursive Regular Expression Synthesis Framework

The US banned Anthropic’s Fable 5 release, but the numbers don’t seem to care