News / #voice Tag Voice 22 articles archived under #voice · RSS Sign in to follow r/LocalLLaMA community 2h ago DramaBox - Most Expressive Voice model ever based on LTX 2.3 The Most Expressive Voice Model. Github: https://github.com/resemble-ai/DramaBox HF Model: https://huggingface.co/ResembleAI/Dramabox HF Space: https://huggingface.co/spaces/ResembleAI/Dramabox   submitted by   /u/manmaynakhashi [link]   [comments] 22 arXiv — NLP / Computation & Language research 15h ago Predicting Psychological Well-Being from Spontaneous Speech using LLMs arXiv:2605.11303v1 Announce Type: new Abstract: We investigate the use of Large Language Models (LLMs) for zero-shot prediction of Ryff Psychological Well-Being (PWB) scores from spontaneous speech. Using a few minutes of voice recordings from 111 participants in the PsyVoiD… 7 arXiv — NLP / Computation & Language research 15h ago Mechanistic Interpretability of ASR models using Sparse Autoencoders arXiv:2605.12225v1 Announce Type: new Abstract: Understanding the internal machinations of deep Transformer-based NLP models is more crucial than ever as these models see widespread use in various domains that affect the public at large, such as industry, academia, finance,… 24 arXiv — NLP / Computation & Language research 15h ago Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs arXiv:2605.12242v1 Announce Type: new Abstract: Automatic Speech Recognition (ASR) transcripts often contain disfluencies, such as fillers, repetitions, and false starts, which reduce readability and hinder downstream applications like chatbots and voice assistants. If left… 5 r/LocalLLaMA community 20h ago I built Derpy Turtle: The Kokoro Trainer, a GUI for training better Kokoro voices with RVC I’ve been working on a tool called Derpy Turtle: The Kokoro Trainer. It started as a random-walk experiment for Kokoro voices, but it has grown into its own thing: a Windows GUI for creating better local voice outputs by combining Kokoro voice search with RVC voice conversion.… 9 Latent.Space news-outlet 1d ago [AINews] Thinking Machines' Native Interaction Models - TML-Interaction-Small 276B-A12B - advances SOTA Realtime Voice and kills standard VAD well done, Team Thinky. 26 Latent.Space news-outlet 5d ago [AINews] GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs OpenAI continues deploying GPT-5 everywhere 18 Smol AI News news-outlet 6d ago GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs **OpenAI** released **GPT-Realtime-2**, a voice model with **GPT-5-class reasoning**, tool use, interruption handling, and extended context windows up to **128K tokens**, achieving top scores on **Big Bench Audio** and **Conversational Dynamics** benchmarks. They also launched a… 21 OpenAI news 9d ago How OpenAI delivers low-latency voice AI at scale How OpenAI rebuilt its WebRTC stack to power real-time Voice AI with low latency, global scale, and seamless conversational turn-taking. 26 vLLM releases dev-tools 25d ago v0.19.2rc0: [Bugfix] Fix k_proj's bias for GLM-ASR (#40160) Signed-off-by: Rishapveer Singh [email protected] 4 NVIDIA Developer Blog official-blog 1mo ago Maximize AI Infrastructure Throughput by Consolidating Underutilized GPU Workloads In production Kubernetes environments, the difference between model requirements and GPU size creates inefficiencies. Lightweight automatic speech recognition... 38 NVIDIA Developer Blog official-blog 1mo ago Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety Agentic AI is an ecosystem where specialized models work together to handle planning, reasoning, retrieval, and safety guardrailing. As these systems scale,... 37 Smol AI News news-outlet 1mo ago not much happened today **Google** launched **Gemini 3.1 Flash Live**, a realtime voice and vision agent model with **2x longer conversation memory**, supporting **70 languages** and **128k context**. **Mistral AI** released **Voxtral TTS**, a low-latency, open-weight text-to-speech model supporting… 31 Hugging Face official-blog 1mo ago A New Framework for Evaluating Voice Agents (EVA) Back to Articles A New Framework for Evaluating Voice Agents (EVA) Enterprise Article Published March 24, 2026 Upvote 92 Tara Bogavelli tarabogavelli ServiceNow-AI Gabrielle Gauthier Melancon gabegma ServiceNow-AI Katrina Stankiewicz kstankiewicz ServiceNow-AI Nifemi Bamgbose… 7 OpenAI Python SDK releases dev-tools 2mo ago v2.28.0 2.28.0 (2026-03-13) Full Changelog: v2.27.0...v2.28.0 Features api: custom voices ( 50dc060 ) 12 ThursdAI news-outlet 3mo ago 📆 ThursdAI - Jan 22 - Clawdbot deep dive, GLM 4.7 Flash, Anthropic constitution + 3 new TSS models From Weights & Biases - deep dive into Clawdbot, an personal AI assistant that learns and evolves, GLM 4.7 Flash, a bunch of new TTS models and Claude's new constitution! 29 Google DeepMind official-blog 5mo ago Improved Gemini audio models for powerful voice experiences Improved Gemini audio models for powerful voice interactions Share x.com Facebook LinkedIn Mail Bibo Xu Director of Product Management Tara Sainath Distinguished Research Scientist General summary Google enhanced Gemini 2.5 Flash Native Audio for better live voice agents. Expect… 37 Hugging Face official-blog 5mo ago Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks Back to Articles Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks Published November 21, 2025 Update on GitHub Upvote 27 Eric Bezzam bezzam Steven Zheng Steveeeeeeen Eustache Le Bihan eustlb Vaibhav Srivastav reach-vb While everyone (and their… 30 Hugging Face official-blog 6mo ago Voice Cloning with Consent Back to Articles Voice Cloning with Consent Published October 28, 2025 Update on GitHub Upvote 40 Margaret Mitchell meg Lucie-Aimée Kaffee frimelle In this blog post, we introduce the idea of a 'voice consent gate' to support voice cloning with consent. We provide an example… 24 Nonint (James Betker) research 24mo ago GPT-4o I’m very pleased to show the world GPT-4o. I came into the project mid-last year with Alexis Conneau with the goal of scaling up speech models and building an “AudioLM”. We knew we had something special late last year, but I don’t think either of us… 22 Eugene Yan research 25mo ago Building an AI Coach to Help Tame My Monkey Mind Building an AI coach with speech-to-text, text-to-speech, an LLM, and a virtual number. 20 Chip Huyen research 31mo ago Multimodality and Large Multimodal Models (LMMs) For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition). However, natural intelligence is not limited to just a single modality. Humans can read, talk, and… 10