News / #version-bump Tag Version Bump 242 articles archived under #version-bump · RSS Sign in to follow vLLM releases dev-tools 18d ago v0.23.0: [Docker] Fix CUTLASS DSL cu13 install order in Dockerfile (#45204) Signed-off-by: Mohammad Miadh Angkad [email protected] (cherry picked from commit 40e065e ) 23 Simon Willison community 19d ago datasette-agent 0.2a0 Release: datasette-agent 0.2a0 Highlights from the release notes: Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive a ToolContext object, and await context.ask_user(...) can ask a yes/no, multiple-choice ( options=[...] ) or… 14 r/MachineLearning community 19d ago Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P] Surprised there's no real tooling for this given how much research exists on continual learning. Built pyrecall to fill the gap. Snapshots skill scores before/after fine-tuning, flags regressions, rolls back LoRA adapters by name. Fully local, no external APIs. v0.1.0, MIT, pip… 17 OpenAI Python SDK releases dev-tools 19d ago v2.41.1 2.41.1 (2026-06-05) Full Changelog: v2.41.0...v2.41.1 Build System Remove scheduled release workflow trigger ( #3366 ) ( 2a91011 ) 25 r/MachineLearning community 19d ago RelayOps - Production-shaped telecom support agent (54% auto-resolve, 0 unsafe actions, full audit + decision console) [P] I just open-sourced RelayOps - a small, honest, production-shaped AI support agent built specifically for telecom and subscription billing queues. Key results (v1.5.1): 54% of a 50-ticket sample queue auto-resolved 0 unsafe auto-actions 0 billing escapes (tested on 12… 25 Anthropic SDK (Python) releases dev-tools 20d ago v0.109.1 0.109.1 (2026-06-09) Full Changelog: v0.109.0...v0.109.1 Bug Fixes api: add frontier_llm refusal category ( d3a806b ) 35 Hacker News — AI on Front Page community 20d ago Upcoming breaking changes for npm v12 Article URL: https://github.blog/changelog/2026-06-09-upcoming-breaking-changes-for-npm-v12/ Comments URL: https://news.ycombinator.com/item?id=48467705 Points: 217 # Comments: 68 16 Anthropic SDK (Python) releases dev-tools 20d ago v0.109.0 0.109.0 (2026-06-09) Full Changelog: v0.108.0...v0.109.0 Features api: add support for Managed Agents deployments and environment variable credentials ( 47633bf ) 12 Anthropic SDK (Python) releases dev-tools 20d ago v0.108.0 0.108.0 (2026-06-09) Full Changelog: v0.107.1...v0.108.0 Features api: add support for claude-mythos-5 and claude-fable-5, with support for server-side fallbacks on refusal ( 6b76649 ) client: adds client-side fallbacks middleware for API providers that do not support… 12 r/LocalLLaMA community 20d ago Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support Hey everyone, TinySearch v0.2.0 (first stable beta) is out. The first version used DuckDuckGo directly, which worked well enough to prove the idea, but yeah.. relying on one search source was way too fragile lol. DDG started throwing limits/CAPTCHAs more often in the last 2… 25 Hacker News — AI on Front Page community 21d ago Let's Encrypt bans certificate usage in any US sanctioned territory [pdf] Article URL: https://letsencrypt.org/documents/LE-SA-v1.7-June-04-2026-diff.pdf Comments URL: https://news.ycombinator.com/item?id=48453275 Points: 223 # Comments: 172 21 r/LocalLLaMA community 21d ago Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server Just saw Xiaomi MiMo announce MiMo-V2.5-Pro UltraSpeed , claiming they broke the 1,000 tokens/sec output barrier on a 1 trillion parameter MoE model . According to them, they’re doing it on a single standard 8-GPU node , not custom wafer-scale hardware like Cerebras and not… 34 Hacker News — AI on Front Page community 21d ago MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second Article URL: https://mimo.xiaomi.com/blog/mimo-tilert-1000tps Comments URL: https://news.ycombinator.com/item?id=48446639 Points: 252 # Comments: 175 30 Ollama releases dev-tools 22d ago v0.30.7 docs: update docs examples to use Gemma 4 instead of Gemma 3 ( #16607 ) 7 Anthropic SDK (Python) releases dev-tools 22d ago v0.107.1 0.107.1 (2026-06-07) Full Changelog: v0.107.0...v0.107.1 Bug Fixes foundry: send x-api-key header for API-key auth ( #62 ) ( 1338141 ), closes #1661 31 Anthropic SDK (Python) releases dev-tools 23d ago v0.107.0 0.107.0 (2026-06-06) Full Changelog: v0.106.0...v0.107.0 Features api: small updates to Managed Agents types ( 72923f9 ) 35 Ollama releases dev-tools 24d ago v0.30.7-rc1 openai: align models list with tags ( #16556 ) 13 Ollama releases dev-tools 24d ago v0.30.7-rc0 launch: use native Windows Hermes config path ( #16558 ) 5 Anthropic SDK (Python) releases dev-tools 24d ago v0.106.0 0.106.0 (2026-06-05) Full Changelog: v0.105.2...v0.106.0 Features api: mark Claude Opus 4.1 as deprecated ( 85068cc ) Bug Fixes client: make Foundry client copy() and with_options() work ( 94146ac ) transform schema: preserve $defs when schema root is a $ref ( #1642 ) ( fc58e06… 19 Ollama releases dev-tools 25d ago v0.30.6-rc0 launch: oh-my-pi ( #16410 ) 34 Ollama releases dev-tools 25d ago v0.30.6 launch: oh-my-pi ( #16410 ) 21 r/LocalLLaMA community 25d ago BeeLlama v0.3.1 – latest llama.cpp with extras! DFlash, MTP, q6_0 cache, TurboQuant. Single RTX 3090: Qwen 3.6 27B & Gemma 4 31B up to 177.8 tps (4.93x over baseline) BeeLlama v0.3.0 and v0.3.1 are here! Big architectural update to align the fork with upstream llama.cpp and integrate all its additions like MTP and Gemma 4 12B support, while also updating DFlash to handle complex configurations like multi-slot and multi-GPU. Now also… 5 Ollama releases dev-tools 25d ago v0.30.5: launch: hermes-desktop app (#16516) Add support to launch the hermes-desktop app alongside the hermes agent from ollama launch. It will go through the install on first run if hermes-desktop is not already installed. 9 ComfyUI releases dev-tools 25d ago v0.24.1 ComfyUI v0.24.1 8 Ollama releases dev-tools 25d ago v0.30.5-rc0: llama.cpp version update (#16511) Bump llama.cpp to b9509, which includes the upstream Gemma 4 12B multimodal projector fixes for the n_head=0 divide-by-zero crash seen on x86/CUDA/Linux/Windows. Fixes #16479 Fixes #16489 Fixes #16491 Fixes #16492 Fixes #16495 11 r/LocalLLaMA community 26d ago The first Gemma 4 12B finetunes are ready Now you can start building your Gemma 4 12B collection :) https://huggingface.co/igorls/gemma-4-12B-it-heretic-GGUF https://huggingface.co/ReadyArt/Melody1437-12B-v0.4-GGUF https://huggingface.co/DuoNeural/Gemma4-12B-IT-Abliterated-GGUF… 26 vLLM releases dev-tools 26d ago v0.22.1rc2: fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init Signed-off-by: khluu [email protected] 9 vLLM releases dev-tools 26d ago v0.22.1: fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init Signed-off-by: khluu [email protected] 28 OpenAI Python SDK releases dev-tools 26d ago v2.41.0 2.41.0 (2026-06-03) Full Changelog: v2.40.0...v2.41.0 Features api: responses.moderation and chat_completions.moderation ( 87e46c2 ) 33 Ollama releases dev-tools 26d ago v0.30.4-rc1: llama-server: fix gemma4 patch wiring (#16477) This will fix the "clip.cpp:4399: Unknown projector type" crash. 4 Ollama releases dev-tools 26d ago v0.30.4: llama-server: fix gemma4 patch wiring (#16477) This will fix the "clip.cpp:4399: Unknown projector type" crash. 38 r/LocalLLaMA community 26d ago Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3 For those who sometimes boost their local model use with openrouter options, or the madlads who have the infrastructure to actually run those locally, it feels like those three model have the edge in best bang for your buck. How then do you decide which one to use? Do you have a… 19 Hacker News — AI on Front Page community 26d ago Elixir v1.20: Now a gradually typed language Article URL: https://elixir-lang.org/blog/2026/06/03/elixir-v1-20-0-released/ Comments URL: https://news.ycombinator.com/item?id=48388324 Points: 252 # Comments: 71 34 Ollama releases dev-tools 26d ago v0.30.4-rc0: Kill llama-server during Windows cleanup (#16458) Windows installer and app cleanup could leave llama-server.exe running when ollama.exe was killed directly, so cleanup now includes llama-server.exe and taskkill /T. 28 ComfyUI releases dev-tools 26d ago v0.24.0 ComfyUI v0.24.0 32 Ollama releases dev-tools 26d ago v0.30.3 models: add support for gemma4-12b ( #16457 ) 30 r/LocalLLaMA community 26d ago How does the new abliteration tool Apostate compare with others? - Abliterlitics Why Qwen 2.5 7B? Apostate is a new abliteration tool by heterodoxin. He asked me to benchmark it. Qwen 2.5 7B was recommended by heterodoxin as it's the most tested model for Apostate. I abliterated the model with Heretic v1.3.0 and Apostate. The models are available on… 33 Hugging Face Daily Papers research 26d ago PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training Abstract PaddleOCR-VL-1.6 enhances document parsing performance through targeted data optimization and progressive post-training techniques, achieving state-of-the-art results on OmniDocBench v1.6. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce PaddleOCR-VL-1.6, an… 9 vLLM releases dev-tools 27d ago v0.22.1rc1: [docker] Stop using extra-index-url for flashinfer-jit-cache (#44366) Signed-off-by: Kevin H. Luu [email protected] 34 Ollama releases dev-tools 27d ago v0.30.2-rc0: fix laguna patch build breakage (#16445) Follow up to #16396 Fix kernel template instantiation so the symbols are exported in the library. 29 Ollama releases dev-tools 27d ago v0.30.2: fix laguna patch build breakage (#16445) Follow up to #16396 Fix kernel template instantiation so the symbols are exported in the library. 38 Ollama releases dev-tools 27d ago v0.30.1: llm: ignore llama-server SSE ping comments (#16443) llama.cpp b9478 added a default 30s SSE ping that emits colon-only comment frames (":\n\n") while streamed requests are idle; Ollama treated non-data SSE lines as JSON, so skip SSE comments in completion and chat streams. 36 Ollama releases dev-tools 27d ago v0.30.1-rc0 launch: isolate Codex launch configuration ( #16437 ) 7 r/MachineLearning community 27d ago Backpropagation destroys V1 brain alignment in one epoch, tracking RSA alignment to fMRI across training for BP, FA, predictive coding, and STDP [R] Third in a series of papers tracking learning rules vs. human fMRI (THINGS dataset, V1–IT, N=3 subjects). Previous finding: untrained CNNs match backprop at V1. This paper asks: when does training break that, and does the learning rule matter? Setup: RSA alignment measured at 8… 30 OpenAI Python SDK releases dev-tools 28d ago v2.40.0 2.40.0 (2026-06-01) Full Changelog: v2.39.0...v2.40.0 Features api: Add Amazon Bedrock Responses support Bug Fixes api: allow setting bedrock api keys on the client directly ( 4d5bfde ) 19 Ollama releases dev-tools 28d ago v0.30.0: launch: migrate Codex config (#16397) launch: migrate Codex config 30 OpenAI Python SDK releases dev-tools 28d ago v2.39.0 2.39.0 (2026-06-01) Full Changelog: v2.38.0...v2.39.0 Features api: workload identity in audit logs, additional_tools item in responses, fix ActionSearch.query to be optional. ( ab60d7a ) 10 ComfyUI releases dev-tools 28d ago v0.23.0 What's Changed feat: MediaPipe face detection (CORE-235) by @kijai in #14009 Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) by @rattus128 in #13802 Repo security stuff. by @comfyanonymous in #14019… 28 Ollama releases dev-tools 28d ago v0.30.0-rc32: llama-server followups (#16353) llama-server followups Misc fixes for #16031 Add back dropped ROCm build flag for multi-GPU support on windows Fix amdhip64_*.dll version detection for "latest" selection Fix embeddings API for consistent normalize behavior with prior versions ci: set up for automated llama.cpp… 19 r/LocalLLaMA community 28d ago mistral.rs v0.8.2: up to 2.8x faster CUDA inference than llama.cpp on GB10, B200, and H100 Hey all! I’ve been working on CUDA performance in mistral.rs, and v0.8.2 is focused on CUDA throughput. The result: on Gemma 4 (dense & MoE), mistral.rs is faster than llama.cpp at every point in my release sweep on GB10/H100/B200. See some results below on GB10 and B200:… 24 Page 2 of 5 · 242 articles ← Newer Older →