I finally put my NPU (Intel Arrow Lake) to use doing ASR for my smart home
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I wrote about what I found in a deep dive elsewhere (which I will no mention because Reddit doesn't like cross linking) but I wanted to share it here since this is where I learn the most about AI stuff and I've seen before questions about NPUs, that are often dismissed as marketing gimmicks (and for the most part they are if we're taking LLMs, but not for other ML workloads). If you care for the traps I found along the way making onnx-asr working on openvino compiled to the NPU, you can read the article, I'm here to post the findings. Table comparing the total time, total energy used (watts during inference and total Joules per transcription).
The energy was sampled at 10hz using I think this is a remarcably result considering intel NPUs are, at least on paper, rather weak with 13TOPS, compared with the >40TOPS of the AMD ones, but still more than fast enough for this task. Some real world number end-to-end number from home assistant: Running this on the NPU frees the CPU to do CPU stuff, and also saves some valuable 2-3gb of valuable vram on my 7900XTX to do LLM stuff. Incidentally, this setup happens to beat in real world usage my 12GB RTX 3060 eGPU that I was using before. On a 3-4s voice command, the NPU takes ~120-160ms, while the 3060 i used before took ~150-300ms. I am not claiming that the NPU is more powerful than the nvidia card, but I suspect that the advantage comes from the NPU being able to wake up instantly from dormancy, while the nvidia card took long enough to ramp up that for short workloads like smart home voice commands, the head start of the NPU was enough to win. Quite likely transcribing long format audio the nvidia card would win again. I finally found a nice use for the NPU, and I want to move the STT audio generation to the NPU next. [link] [comments] |
More from r/LocalLLaMA
-
SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery
May 26
-
Qwen3.5 27B Uncensored Heretic Native MTP Preserved is Out Now With the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs, NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats!
May 26
-
Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs.
May 26
-
Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats
May 26
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.