Version Bump — AI news on Prismix

r/LocalLLaMA community 29d ago

Llama Studio v0.2.0

I have made an update to my llama-server WebUI based on some awesome feedback and interaction with the community. 1) JSON model config replaced by per-model shell scripts. Run from CLI, paste from unsloth, email to your buddy or post to reddit: Using real shell scripts to store…

17

Hacker News — AI on Front Page community 1mo ago

The AV2 Video Standard Has Released (Final v1.0 Specification)

Article URL: https://av2.aomedia.org Comments URL: https://news.ycombinator.com/item?id=48340910 Points: 203 # Comments: 80

34

r/LocalLLaMA community 1mo ago

this new Moss tts 1.5 is damn good with voice cloning

https://huggingface.co/spaces/OpenMOSS-Team/MOSS-TTS-v1.5 I prefer this over fish audio s2 pro because fish audio dont allow commercial use Long Cat DiT 3.5 is also a another good model.   submitted by   /u/9r4n4y [link]   [comments]

38

vLLM releases dev-tools 1mo ago

v0.22.1rc0: [CI] Make Model Executor test hangs fail fast with a traceback (#43971)

Signed-off-by: khluu [email protected] Co-authored-by: Claude [email protected]

10

llama.cpp releases dev-tools 1mo ago

b9411

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation ( #23346 ) llama : support DeepSeek V3.2 model family (with DSA lightning indexer) convert : handle DeepseekV32ForCausalLM architecture ggml : support for f16 GGML_OP_FILL…

34

Ollama releases dev-tools 1mo ago

v0.30.0-rc31

ci fix - non-shallow MLX checkout

29

Ollama releases dev-tools 1mo ago

v0.30.0-rc30

version bump

18

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.105.2

0.105.2 (2026-05-29) Full Changelog: v0.105.1...v0.105.2

14

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.105.1

0.105.1 (2026-05-29) Full Changelog: v0.105.0...v0.105.1 Chores internal: use Trusted Publishing for PyPI releases ( 1d04fc5 )

34

Ollama releases dev-tools 1mo ago

v0.30.0-rc29

review comments

24

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.105.0

0.105.0 (2026-05-28) Full Changelog: v0.104.1...v0.105.0 Features api: Add support for claude-opus-4-8, mid-conversation system blocks, and usage.output_tokens_details ( f18b014 ) support custom file size caps ( #1825 ) ( 7e5f944 ) Chores examples: rename managed-agents…

12

r/LocalLLaMA community 1mo ago

Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)

Context Krasis is an LLM runtime for running models that don't fit into VRAM. Krasis streams the model through VRAM from system RAM efficiently and handles prefill and decode as separate architectures and optimised usecases. Latest results (v1.0 release) 1x Laptop RTX 3070…

22

vLLM releases dev-tools 1mo ago

v0.22.0rc3: [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

Signed-off-by: Vadim Gimpelson [email protected] Co-authored-by: Nick Hill [email protected]

20

vLLM releases dev-tools 1mo ago

v0.22.0: [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

Signed-off-by: Vadim Gimpelson [email protected] Co-authored-by: Nick Hill [email protected]

29

vLLM releases dev-tools 1mo ago

v0.22.0rc2: Fix early CUDA init (#43791)

Signed-off-by: Harry Mellor [email protected] (cherry picked from commit 41688e2 )

11

Ollama releases dev-tools 1mo ago

v0.30.0-rc28

add OLLAMA_IGPU_ENABLE and largely disable iGPUs by default

14

ComfyUI releases dev-tools 1mo ago

v0.22.3

ComfyUI v0.22.3

36

r/MachineLearning community 1mo ago

Best Text to Text Translation Model? [D]

I'm working on a project that translates any language into English. So far, I've tried NMT models like NLLB, MADLAD, and SeamlessM4T v2. The main issue is that they struggle with proper nouns such as: - names - places - dates - organizations I also tried LLMs like Gemma 4, Qwen…

22

r/LocalLLaMA community 1mo ago

Info: Nvidia Cuda 13.3 landed

Cuda 13.3 Downloads Release Notes Anybody already tried llama.cpp with 13.3?   submitted by   /u/parrot42 [link]   [comments]

18

vLLM releases dev-tools 1mo ago

v0.22.0rc1: [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)

Signed-off-by: Nick Hill [email protected] Co-authored-by: Wentao Ye [email protected] (cherry picked from commit 8c94938 )

18

Ollama releases dev-tools 1mo ago

v0.30.0-rc27

ci: windows path workaround for CPU build

20

Ollama releases dev-tools 1mo ago

v0.30.0-rc26: Merge remote-tracking branch 'upstream/main' into llama-runner-phase-0

Conflicts: server/images.go server/images_test.go

33

r/LocalLLaMA community 1mo ago

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

MOSS-TTS-v1.5 MOSS-TTS-v1.5 is continued from MOSS-TTS 1.0 . It preserves the main 1.0 capabilities, including zero-shot voice cloning, long-form speech generation, token-level duration control, Pinyin/IPA pronunciation control, multilingual synthesis, and code-switching. For…

10

r/LocalLLaMA community 1mo ago

Harbor v0.4.19 - vllm/sglang/llama.cpp launch codex/claude/pi/opencode

I'm usually not posting about Harbor releases out of the respect for the community here, but I think v0.4.19 might save a lot of people some time. Harbor can now launch your local agentic coding tools with local inference backends. For example, to run pi + vllm: # model…

26

Ollama releases dev-tools 1mo ago

v0.30.0-rc25

ci: fix WoA cross-compile

13

r/LocalLLaMA community 1mo ago

MiMo-V2.5-coder

Hi, I've just released MiMo-V2.5-coder. If you have 128 Gb, this is an excellent alternative to Qwen3.6 and DS4, especially for coding. Fast, and with reliable tool calling. Give it a try!   submitted by   /u/jedisct1 [link]   [comments]

7

Ollama releases dev-tools 1mo ago

v0.30.0-rc24

version bump

20

r/MachineLearning community 1mo ago

LQS v3.1 — an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]

Solo author here. I spent the last six months building (and then sunsetting) a marketplace for AI training data. The marketplace failed for an interesting reason: the actual bottleneck isn't supply. There's tons of data. The bottleneck is that buyers can't independently evaluate…

14

r/LocalLLaMA community 1mo ago

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

BeeLlama v0.2.0 is here! Not quite a pegasus, but close enough. GitHub | Qwen 3.6 27B Quick Start | Gemma 4 31B Quick Start Full Gemma 4 31B support with efficient DFlash implementation and vision. Major Qwen 3.6 27B performance update from lower DFlash overhead, cleaner prefill…

28

ComfyUI releases dev-tools 1mo ago

v0.22.2

ComfyUI v0.22.2

6

r/LocalLLaMA community 1mo ago

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3. You can try it here: https://huggingface.co/spaces/av-codes/prompt-injection-detector --- I've been interested in prompt…

5

Ollama releases dev-tools 1mo ago

v0.30.0-rc23

lint fix

8

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.104.1

0.104.1 (2026-05-21) Full Changelog: v0.104.0...v0.104.1 Bug Fixes streaming: carry encrypted_content through beta compaction accumulator ( #1821 ) ( f7a720c )

29

Hacker News — AI on Front Page community 1mo ago

Deno 2.8

Article URL: https://deno.com/blog/v2.8 Comments URL: https://news.ycombinator.com/item?id=48234380 Points: 215 # Comments: 98

27

ComfyUI releases dev-tools 1mo ago

v0.22.1

ComfyUI v0.22.1

18

OpenAI Python SDK releases dev-tools 1mo ago

v2.38.0

2.38.0 (2026-05-21) Full Changelog: v2.37.0...v2.38.0 Features api: api update ( 33d1d01 ) api: manual updates ( a21700a ) api: update OpenAPI spec or Stainless config ( 00265c5 ) Chores api: docs updates ( ee10152 ) check release PR custom code sync ( 2638779 ) remove release…

26

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.104.0

0.104.0 (2026-05-21) Full Changelog: v0.103.1...v0.104.0 Features api: Add support for thinking-token-count beta for estimated tokens in thinking block deltas when streaming ( 80d0fdf )

7

Ollama releases dev-tools 1mo ago

v0.30.0-rc22

version bump

5

r/LocalLLaMA community 1mo ago

LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

I've been building this for the past few months as a side project — started because I didn't want to run llama.cpp from the command line every time I wanted to try a model. I just wanted something that worked with a click. Fair warning: I'm not a developer. This is 100% vibe…

33

ComfyUI releases dev-tools 1mo ago

v0.22.0

ComfyUI v0.22.0

30

llama.cpp releases dev-tools 1mo ago

b9246: snapdragon: update toolchain to v0.6 (#23369)

snapdragon: update compiler flags to enable all CPU features snapdragon: update readme to point to toolchain v0.6 snapdragon: bump toolchain docker to v0.6

37

r/LocalLLaMA community 1mo ago

Google AI Edge Gallery v1.0.13 & v1.0.14 updates: Gemma 4 Multi-Token Prediction, Pixel TPU support, experimental MCP, new skills, now saves chat history

  submitted by   /u/AnticitizenPrime [link]   [comments]

26

Ollama releases dev-tools 1mo ago

v0.30.0-rc21

improve windows exit error logs

32

r/LocalLLaMA community 1mo ago

Why is LM-Studio download page showing me 0.4.7 to download when the latest version is 0.4.13?

I'm currently running LM-Studio 0.4.12. In the app if I check for updates it says there's a new version (0.4.13), I can read the changelog for 0.4.13, but when I go to https://lmstudio.ai/download it shows 0.4.7. What's going on here? Anyone knows?   submitted by  …

37

Hugging Face official-blog 1mo ago

OlmoEarth v1.1: A more efficient family of models

Back to Articles OlmoEarth v1.1: A more efficient family of models Team Article Published May 19, 2026 Upvote 1 Kyle Wiggers Ai2Comms allenai 🧠 Models: https://huggingface.co/collections/allenai/olmoearth | 📄 Tech Report: https://allenai.org/papers/olmoearth_v1_1 | 💻 Code:…

38

r/LocalLLaMA community 1mo ago

Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM

Greetings from former TurboQuant's biggest defender, now middle-sized niche-aware TurboQuant defender. Today I'm presenting to you the results of me thoroughly exploring the world of PPL and KLD benchmarks with my single RTX 3090 using BeeLlama v0.1.2 , with some backstory of…

31

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.103.1

0.103.1 (2026-05-19) Full Changelog: v0.103.0...v0.103.1 Bug Fixes runner: skip tool calls SessionToolRunner does not own ( #1817 ) ( 9425c6a )

8

Anthropic SDK (Python) releases dev-tools 1mo ago

v0.103.0

0.103.0 (2026-05-19) Full Changelog: v0.102.0...v0.103.0 Features client: Add support for self-hosted sandboxes in CMA with sandbox helpers ( e5625b0 )

22

Ollama releases dev-tools 1mo ago

v0.30.0-rc20

ci: fix cache miss on rocm build

6

Ollama releases dev-tools 1mo ago

v0.30.0-rc19

missing file

27

Llama Studio v0.2.0

The AV2 Video Standard Has Released (Final v1.0 Specification)

this new Moss tts 1.5 is damn good with voice cloning

v0.22.1rc0: [CI] Make Model Executor test hangs fail fast with a traceback (#43971)

b9411

v0.30.0-rc31

v0.30.0-rc30

v0.105.2

v0.105.1

v0.30.0-rc29

v0.105.0

Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)

v0.22.0rc3: [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

v0.22.0: [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

v0.22.0rc2: Fix early CUDA init (#43791)

v0.30.0-rc28

v0.22.3

Best Text to Text Translation Model? [D]

Info: Nvidia Cuda 13.3 landed

v0.22.0rc1: [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)

v0.30.0-rc27

v0.30.0-rc26: Merge remote-tracking branch &#39;upstream/main&#39; into llama-runner-phase-0

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

Harbor v0.4.19 - vllm/sglang/llama.cpp launch codex/claude/pi/opencode

v0.30.0-rc25

MiMo-V2.5-coder

v0.30.0-rc24

LQS v3.1 — an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

v0.22.2

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

v0.30.0-rc23

v0.104.1

Deno 2.8

v0.22.1

v2.38.0

v0.104.0

v0.30.0-rc22

LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

v0.22.0

b9246: snapdragon: update toolchain to v0.6 (#23369)

Google AI Edge Gallery v1.0.13 & v1.0.14 updates: Gemma 4 Multi-Token Prediction, Pixel TPU support, experimental MCP, new skills, now saves chat history

v0.30.0-rc21

Why is LM-Studio download page showing me 0.4.7 to download when the latest version is 0.4.13?

OlmoEarth v1.1: A more efficient family of models

Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM

v0.103.1

v0.103.0

v0.30.0-rc20

v0.30.0-rc19

v0.30.0-rc26: Merge remote-tracking branch 'upstream/main' into llama-runner-phase-0