r/LocalLLaMA · June 7, 2026 · 1 min read

Dockerized Nemotron 3.5 ASR — Switched from Parakeet, better multilingual support + streaming (4.5x realtime speed on cpu)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I was originally using Parakeet for my speech recognition pipeline but decided to give Nemotron 3.5 a shot. After

testing it on some multilingual audio clips, it's been working great so far.

What sold me:

- Better language support (40+ locales from one model)

- Native streaming architecture — no more buffering entire files

- Tested on CPU and got about 4.5x realtime speed using onnxruntime-genai as the backend

I've containerized it with Docker so you can just clone and run. There are example files showing how to call the

API from a client (both streaming and file upload). The repo is in comment

One thing — I haven't tested CUDA support yet. It should work out of the box but you might need to tweak the yaml

and requirements.txt to get it running on GPU. If anyone tries it, let me know how it goes.

Discussion (0)

No comments yet. Sign in and be the first to say something.