Dockerized Nemotron 3.5 ASR — Switched from Parakeet, better multilingual support + streaming (4.5x realtime speed on cpu)
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I was originally using Parakeet for my speech recognition pipeline but decided to give Nemotron 3.5 a shot. After
testing it on some multilingual audio clips, it's been working great so far.
What sold me:
- Better language support (40+ locales from one model)
- Native streaming architecture — no more buffering entire files
- Tested on CPU and got about 4.5x realtime speed using onnxruntime-genai as the backend
I've containerized it with Docker so you can just clone and run. There are example files showing how to call the
API from a client (both streaming and file upload). The repo is in comment
One thing — I haven't tested CUDA support yet. It should work out of the box but you might need to tweak the yaml
and requirements.txt to get it running on GPU. If anyone tries it, let me know how it goes.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.