Tag

Version Bump

242 articles archived under #version-bump · RSS

vLLM releases dev-tools 18d ago

v0.23.0: [Docker] Fix CUTLASS DSL cu13 install order in Dockerfile (#45204)

Signed-off-by: Mohammad Miadh Angkad [email protected] (cherry picked from commit 40e065e )

23
Simon Willison community 19d ago

datasette-agent 0.2a0

Release: datasette-agent 0.2a0 Highlights from the release notes: Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive a ToolContext object, and await context.ask_user(...) can ask a yes/no, multiple-choice ( options=[...] ) or…

14
r/MachineLearning community 19d ago

Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P]

Surprised there's no real tooling for this given how much research exists on continual learning. Built pyrecall to fill the gap. Snapshots skill scores before/after fine-tuning, flags regressions, rolls back LoRA adapters by name. Fully local, no external APIs. v0.1.0, MIT, pip…

17
OpenAI Python SDK releases dev-tools 19d ago

v2.41.1

2.41.1 (2026-06-05) Full Changelog: v2.41.0...v2.41.1 Build System Remove scheduled release workflow trigger ( #3366 ) ( 2a91011 )

25
r/MachineLearning community 19d ago

RelayOps - Production-shaped telecom support agent (54% auto-resolve, 0 unsafe actions, full audit + decision console) [P]

I just open-sourced RelayOps - a small, honest, production-shaped AI support agent built specifically for telecom and subscription billing queues. Key results (v1.5.1): 54% of a 50-ticket sample queue auto-resolved 0 unsafe auto-actions 0 billing escapes (tested on 12…

25
Anthropic SDK (Python) releases dev-tools 20d ago

v0.109.1

0.109.1 (2026-06-09) Full Changelog: v0.109.0...v0.109.1 Bug Fixes api: add frontier_llm refusal category ( d3a806b )

35
Hacker News — AI on Front Page community 20d ago

Upcoming breaking changes for npm v12

Article URL: https://github.blog/changelog/2026-06-09-upcoming-breaking-changes-for-npm-v12/ Comments URL: https://news.ycombinator.com/item?id=48467705 Points: 217 # Comments: 68

16
Anthropic SDK (Python) releases dev-tools 20d ago

v0.109.0

0.109.0 (2026-06-09) Full Changelog: v0.108.0...v0.109.0 Features api: add support for Managed Agents deployments and environment variable credentials ( 47633bf )

12
Anthropic SDK (Python) releases dev-tools 20d ago

v0.108.0

0.108.0 (2026-06-09) Full Changelog: v0.107.1...v0.108.0 Features api: add support for claude-mythos-5 and claude-fable-5, with support for server-side fallbacks on refusal ( 6b76649 ) client: adds client-side fallbacks middleware for API providers that do not support…

12
r/LocalLLaMA community 20d ago

Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support

Hey everyone, TinySearch v0.2.0 (first stable beta) is out. The first version used DuckDuckGo directly, which worked well enough to prove the idea, but yeah.. relying on one search source was way too fragile lol. DDG started throwing limits/CAPTCHAs more often in the last 2…

25
Hacker News — AI on Front Page community 21d ago

Let's Encrypt bans certificate usage in any US sanctioned territory [pdf]

Article URL: https://letsencrypt.org/documents/LE-SA-v1.7-June-04-2026-diff.pdf Comments URL: https://news.ycombinator.com/item?id=48453275 Points: 223 # Comments: 172

21
r/LocalLLaMA community 21d ago

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

Just saw Xiaomi MiMo announce MiMo-V2.5-Pro UltraSpeed , claiming they broke the 1,000 tokens/sec output barrier on a 1 trillion parameter MoE model . According to them, they’re doing it on a single standard 8-GPU node , not custom wafer-scale hardware like Cerebras and not…

34
Hacker News — AI on Front Page community 21d ago

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

Article URL: https://mimo.xiaomi.com/blog/mimo-tilert-1000tps Comments URL: https://news.ycombinator.com/item?id=48446639 Points: 252 # Comments: 175

30
Ollama releases dev-tools 22d ago

v0.30.7

docs: update docs examples to use Gemma 4 instead of Gemma 3 ( #16607 )

7
Anthropic SDK (Python) releases dev-tools 22d ago

v0.107.1

0.107.1 (2026-06-07) Full Changelog: v0.107.0...v0.107.1 Bug Fixes foundry: send x-api-key header for API-key auth ( #62 ) ( 1338141 ), closes #1661

31
Anthropic SDK (Python) releases dev-tools 23d ago

v0.107.0

0.107.0 (2026-06-06) Full Changelog: v0.106.0...v0.107.0 Features api: small updates to Managed Agents types ( 72923f9 )

35
Ollama releases dev-tools 24d ago

v0.30.7-rc1

openai: align models list with tags ( #16556 )

13
Ollama releases dev-tools 24d ago

v0.30.7-rc0

launch: use native Windows Hermes config path ( #16558 )

5
Anthropic SDK (Python) releases dev-tools 24d ago

v0.106.0

0.106.0 (2026-06-05) Full Changelog: v0.105.2...v0.106.0 Features api: mark Claude Opus 4.1 as deprecated ( 85068cc ) Bug Fixes client: make Foundry client copy() and with_options() work ( 94146ac ) transform schema: preserve $defs when schema root is a $ref ( #1642 ) ( fc58e06…

19
Ollama releases dev-tools 25d ago

v0.30.6-rc0

launch: oh-my-pi ( #16410 )

34
Ollama releases dev-tools 25d ago

v0.30.6

launch: oh-my-pi ( #16410 )

21
r/LocalLLaMA community 25d ago

BeeLlama v0.3.1 – latest llama.cpp with extras! DFlash, MTP, q6_0 cache, TurboQuant. Single RTX 3090: Qwen 3.6 27B & Gemma 4 31B up to 177.8 tps (4.93x over baseline)

BeeLlama v0.3.0 and v0.3.1 are here! Big architectural update to align the fork with upstream llama.cpp and integrate all its additions like MTP and Gemma 4 12B support, while also updating DFlash to handle complex configurations like multi-slot and multi-GPU. Now also…

5
Ollama releases dev-tools 25d ago

v0.30.5: launch: hermes-desktop app (#16516)

Add support to launch the hermes-desktop app alongside the hermes agent from ollama launch. It will go through the install on first run if hermes-desktop is not already installed.

9
ComfyUI releases dev-tools 25d ago

v0.24.1

ComfyUI v0.24.1

8
Ollama releases dev-tools 25d ago

v0.30.5-rc0: llama.cpp version update (#16511)

Bump llama.cpp to b9509, which includes the upstream Gemma 4 12B multimodal projector fixes for the n_head=0 divide-by-zero crash seen on x86/CUDA/Linux/Windows. Fixes #16479 Fixes #16489 Fixes #16491 Fixes #16492 Fixes #16495

11
r/LocalLLaMA community 26d ago

The first Gemma 4 12B finetunes are ready

Now you can start building your Gemma 4 12B collection :) https://huggingface.co/igorls/gemma-4-12B-it-heretic-GGUF https://huggingface.co/ReadyArt/Melody1437-12B-v0.4-GGUF https://huggingface.co/DuoNeural/Gemma4-12B-IT-Abliterated-GGUF…

26
vLLM releases dev-tools 26d ago

v0.22.1rc2: fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init

Signed-off-by: khluu [email protected]

9
vLLM releases dev-tools 26d ago

v0.22.1: fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init

Signed-off-by: khluu [email protected]

28
OpenAI Python SDK releases dev-tools 26d ago

v2.41.0

2.41.0 (2026-06-03) Full Changelog: v2.40.0...v2.41.0 Features api: responses.moderation and chat_completions.moderation ( 87e46c2 )

33
Ollama releases dev-tools 26d ago

v0.30.4-rc1: llama-server: fix gemma4 patch wiring (#16477)

This will fix the "clip.cpp:4399: Unknown projector type" crash.

4
Ollama releases dev-tools 26d ago

v0.30.4: llama-server: fix gemma4 patch wiring (#16477)

This will fix the "clip.cpp:4399: Unknown projector type" crash.

38
r/LocalLLaMA community 26d ago

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

For those who sometimes boost their local model use with openrouter options, or the madlads who have the infrastructure to actually run those locally, it feels like those three model have the edge in best bang for your buck. How then do you decide which one to use? Do you have a…

19
Hacker News — AI on Front Page community 26d ago

Elixir v1.20: Now a gradually typed language

Article URL: https://elixir-lang.org/blog/2026/06/03/elixir-v1-20-0-released/ Comments URL: https://news.ycombinator.com/item?id=48388324 Points: 252 # Comments: 71

34
Ollama releases dev-tools 26d ago

v0.30.4-rc0: Kill llama-server during Windows cleanup (#16458)

Windows installer and app cleanup could leave llama-server.exe running when ollama.exe was killed directly, so cleanup now includes llama-server.exe and taskkill /T.

28
ComfyUI releases dev-tools 26d ago

v0.24.0

ComfyUI v0.24.0

32
Ollama releases dev-tools 26d ago

v0.30.3

models: add support for gemma4-12b ( #16457 )

30
r/LocalLLaMA community 26d ago

How does the new abliteration tool Apostate compare with others? - Abliterlitics

Why Qwen 2.5 7B? Apostate is a new abliteration tool by heterodoxin. He asked me to benchmark it. Qwen 2.5 7B was recommended by heterodoxin as it's the most tested model for Apostate. I abliterated the model with Heretic v1.3.0 and Apostate. The models are available on…

33
Hugging Face Daily Papers research 26d ago

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Abstract PaddleOCR-VL-1.6 enhances document parsing performance through targeted data optimization and progressive post-training techniques, achieving state-of-the-art results on OmniDocBench v1.6. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce PaddleOCR-VL-1.6, an…

9
vLLM releases dev-tools 27d ago

v0.22.1rc1: [docker] Stop using extra-index-url for flashinfer-jit-cache (#44366)

Signed-off-by: Kevin H. Luu [email protected]

34
Ollama releases dev-tools 27d ago

v0.30.2-rc0: fix laguna patch build breakage (#16445)

Follow up to #16396 Fix kernel template instantiation so the symbols are exported in the library.

29
Ollama releases dev-tools 27d ago

v0.30.2: fix laguna patch build breakage (#16445)

Follow up to #16396 Fix kernel template instantiation so the symbols are exported in the library.

38
Ollama releases dev-tools 27d ago

v0.30.1: llm: ignore llama-server SSE ping comments (#16443)

llama.cpp b9478 added a default 30s SSE ping that emits colon-only comment frames (":\n\n") while streamed requests are idle; Ollama treated non-data SSE lines as JSON, so skip SSE comments in completion and chat streams.

36
Ollama releases dev-tools 27d ago

v0.30.1-rc0

launch: isolate Codex launch configuration ( #16437 )

7
r/MachineLearning community 27d ago

Backpropagation destroys V1 brain alignment in one epoch, tracking RSA alignment to fMRI across training for BP, FA, predictive coding, and STDP [R]

Third in a series of papers tracking learning rules vs. human fMRI (THINGS dataset, V1–IT, N=3 subjects). Previous finding: untrained CNNs match backprop at V1. This paper asks: when does training break that, and does the learning rule matter? Setup: RSA alignment measured at 8…

30
OpenAI Python SDK releases dev-tools 28d ago

v2.40.0

2.40.0 (2026-06-01) Full Changelog: v2.39.0...v2.40.0 Features api: Add Amazon Bedrock Responses support Bug Fixes api: allow setting bedrock api keys on the client directly ( 4d5bfde )

19
Ollama releases dev-tools 28d ago

v0.30.0: launch: migrate Codex config (#16397)

launch: migrate Codex config

30
OpenAI Python SDK releases dev-tools 28d ago

v2.39.0

2.39.0 (2026-06-01) Full Changelog: v2.38.0...v2.39.0 Features api: workload identity in audit logs, additional_tools item in responses, fix ActionSearch.query to be optional. ( ab60d7a )

10
ComfyUI releases dev-tools 28d ago

v0.23.0

What's Changed feat: MediaPipe face detection (CORE-235) by @kijai in #14009 Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) by @rattus128 in #13802 Repo security stuff. by @comfyanonymous in #14019…

28
Ollama releases dev-tools 28d ago

v0.30.0-rc32: llama-server followups (#16353)

llama-server followups Misc fixes for #16031 Add back dropped ROCm build flag for multi-GPU support on windows Fix amdhip64_*.dll version detection for "latest" selection Fix embeddings API for consistent normalize behavior with prior versions ci: set up for automated llama.cpp…

19
r/LocalLLaMA community 28d ago

mistral.rs v0.8.2: up to 2.8x faster CUDA inference than llama.cpp on GB10, B200, and H100

Hey all! I’ve been working on CUDA performance in mistral.rs, and v0.8.2 is focused on CUDA throughput. The result: on Gemma 4 (dense & MoE), mistral.rs is faster than llama.cpp at every point in my release sweep on GB10/H100/B200. See some results below on GB10 and B200:…

24

v0.23.0: [Docker] Fix CUTLASS DSL cu13 install order in Dockerfile (#45204)

datasette-agent 0.2a0

Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P]

v2.41.1

RelayOps - Production-shaped telecom support agent (54% auto-resolve, 0 unsafe actions, full audit + decision console) [P]

v0.109.1

Upcoming breaking changes for npm v12

v0.109.0

v0.108.0

Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support

Let's Encrypt bans certificate usage in any US sanctioned territory [pdf]

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

v0.30.7

v0.107.1

v0.107.0

v0.30.7-rc1

v0.30.7-rc0

v0.106.0

v0.30.6-rc0

v0.30.6

BeeLlama v0.3.1 – latest llama.cpp with extras! DFlash, MTP, q6_0 cache, TurboQuant. Single RTX 3090: Qwen 3.6 27B & Gemma 4 31B up to 177.8 tps (4.93x over baseline)

v0.30.5: launch: hermes-desktop app (#16516)

v0.24.1

v0.30.5-rc0: llama.cpp version update (#16511)

The first Gemma 4 12B finetunes are ready

v0.22.1rc2: fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init

v0.22.1: fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init

v2.41.0

v0.30.4-rc1: llama-server: fix gemma4 patch wiring (#16477)

v0.30.4: llama-server: fix gemma4 patch wiring (#16477)

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

Elixir v1.20: Now a gradually typed language

v0.30.4-rc0: Kill llama-server during Windows cleanup (#16458)

v0.24.0

v0.30.3

How does the new abliteration tool Apostate compare with others? - Abliterlitics

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

v0.22.1rc1: [docker] Stop using extra-index-url for flashinfer-jit-cache (#44366)

v0.30.2-rc0: fix laguna patch build breakage (#16445)

v0.30.2: fix laguna patch build breakage (#16445)

v0.30.1: llm: ignore llama-server SSE ping comments (#16443)

v0.30.1-rc0

Backpropagation destroys V1 brain alignment in one epoch, tracking RSA alignment to fMRI across training for BP, FA, predictive coding, and STDP [R]

v2.40.0

v0.30.0: launch: migrate Codex config (#16397)

v2.39.0

v0.23.0

v0.30.0-rc32: llama-server followups (#16353)

mistral.rs v0.8.2: up to 2.8x faster CUDA inference than llama.cpp on GB10, B200, and H100