r/LocalLLaMA · · 2 min read

I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I kept wanting to talk to my local models instead of typing, but every voice setup wanted a GPU, shipped my audio to the cloud, or was macOS-only. So I built one that's none of those — and I benchmarked it, so these are real measured numbers, not vibes.

One command installs the whole stack and wires it into your agent. Then you just talk.

Everything runs on CPU and stays off your GPU (your GPU is busy running the actual LLM):

  • Silero VAD — knows when you start/stop talking, no push-to-talk. ~0.09 ms/frame.
  • Parakeet TDT 0.6B v3 — local ONNX INT8 STT, 25 languages, OpenAI-compatible on :5093. A 2.5 s clip transcribes in ~280 ms (~9× realtime).
  • Supertonic TTS 3 — local ONNX FP16 synthesis, multilingual, voices F1–F5 / M1–M5. A short reply renders in ~1.7 s (1.6–2.8× realtime), and a TTS→STT round-trip comes back word-for-word.

Measured on a plain i7-12700KF, CPU only, no GPU touched — both my 3090s were full serving the LLM itself in vLLM, which is exactly the point: voice runs on CPU, VRAM stays with your model.

Works with whatever agent you use — one install drops a talk skill into all of them: Claude Code, Hermes Agent, OpenClaw, OpenCode, and Codex. The same installer also auto-installs and starts the STT + TTS backends for you.

Data flow — nothing leaves the box:

you -> Silero VAD (CPU) -> Parakeet STT (CPU) -> your LLM (Ollama / LM Studio / vLLM) -> Supertonic 3 (CPU) -> speakers 

Install (macOS / Linux):

git clone https://github.com/groxaxo/opencode-voice-service cd opencode-voice-service && ./setup.sh 

Windows (PowerShell):

.\setup.ps1 

The installer is interactive (pick components + agent integrations) and auto-starts via systemd / launchd / Task Scheduler. Free and MIT-licensed.

GitHub: https://github.com/groxaxo/opencode-voice-service

Runs fine on a 4-year-old ThinkPad with no GPU. Happy to answer VAD-tuning or ONNX-performance questions.

submitted by /u/blackstoreonline
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA