Benchmark: ONNX Runtime vs HF Transformers vs GGUF for Parakeet TDT 0.6B on CPU-only hardware [D]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Sharing a small CPU inference benchmark for nvidia/parakeet-tdt-0.6b-v3 that turned up a result I didn't expect going in.
Setup: 2 x86-64 vCPUs (AVX2/FMA), 7.7GB RAM, no GPU. Test audio: 16.78s Harvard sentences at 16kHz mono.
Results:
| Inference path | RTF | Peak Memory | CPU utilization |
|---|---|---|---|
| HF Transformers bfloat16 | 0.519 | ~430MB delta | — |
| ONNX Runtime FP32 (onnx-asr) | 0.328 | 2,667MB | 49.9% |
| GGUF Q6_K (parakeet.cpp) | 0.708 | 928MB | 99.8% |
ONNX Runtime is 37% faster than HF Transformers bfloat16 on this hardware. The gap comes from operator fusion and AVX2-optimized execution providers in ONNX Runtime that the PyTorch CPU path doesn't exploit as aggressively. Memory cost is the tradeoff — FP32 weights load at ~2.7GB peak.
GGUF Q6_K trades throughput for memory efficiency. 928MB peak vs 2.7GB, but RTF doubles and CPU utilization hits 99.8%. For memory-constrained deployments it's the right call. For sustained throughput on a box with headroom, ONNX wins.
One methodological note worth flagging for anyone doing ASR benchmarking with synthetic audio: espeak-ng inflated WER to 20.9% on a sentence set where gTTS got 4.65%. Both runtimes got identical WER within each run, confirming it's the TTS distribution mismatch rather than model or quantization quality. NVIDIA reports 1.93% on LibriSpeech — the gTTS number is a much more honest CPU-only proxy.
Github repo with code, raw results, and evaluation scripts in comments below.
Disclosure: benchmark was run using Neo, a local AI engineering agent inside Claude Code using its MCP. Mentioning because the runtime and audio choices came from its research phase, not prior knowledge on my end.
[link] [comments]
More from r/MachineLearning
-
How do you identify researchers who are good? [D]
Jun 5
-
An autonomous research agent was the #1 contributor in OpenAI's Hiring Competition Parameter Golf (by merged records)[R]
Jun 5
-
Are We Underestimating Small Edge AI Models?[D]
Jun 5
-
Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R]
Jun 5
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.