How I implemented ASR bias for voice transcription models [Open Source]
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I've been spending the last couple of weeks building a Wispr Flow clone as an open source project. For context, it is a voice dictation app that lets you type faster, by speaking instead of actually typing. I spent the first week building the basic STT capabilities. One of the coolest features that Wispr Flow has is ASR biasing. Wispr Flow calls it its dictionary. I was able to figure out how to implement that for my project and wanted to share how it was done. What is ASR biasing? ASR biasing is a transcription technique that guides the model with hints on how words are spelled, or what phrases are common. In my example in the video, I gave guidance that I wanted to talk about the “Knicks” and “OG Anunoby”. When you have biasing set up, the words that you have set up are more likely to show up when you say phrases that sound similar. How it's implemented in code Implementing ASR biasing is actually incredibly easy. Each model provider handles it differently, and they call it different things. For example, OpenAI and Groq set a prompt as its bias mechanism, similar to an LLM system prompt. Local models like whisper.cpp and local Mac models from MLX also run the same prompt system. In other providers like Deepgram and Eleven Labs, they call them key terms and are configured by search parameters. This is what it looks like to implement in Groq. It's as simple as injecting the dictionary words into the model's “system prompt”.
In Freestyle, we've implemented ASR biasing and call it our “Vocabulary” feature. When you create a vocabulary, it is saved locally within Freestyle. Every time you run inference, your saved vocabulary is freshly injected into models’ system prompt or keyterms. Freestyle oss project All of the work that we've done around ASR biasing is open source and available in our GitHub repo. If this project sounds interesting to you, consider giving it a star! We're also looking to build a community of people interested in working on open source voice dictation. [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.