News / #voice Tag Voice 365 articles archived under #voice · RSS Sign in to follow Smol AI News news-outlet 1mo ago GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs **OpenAI** released **GPT-Realtime-2**, a voice model with **GPT-5-class reasoning**, tool use, interruption handling, and extended context windows up to **128K tokens**, achieving top scores on **Big Bench Audio** and **Conversational Dynamics** benchmarks. They also launched a… 22 OpenAI news 1mo ago How OpenAI delivers low-latency voice AI at scale How OpenAI rebuilt its WebRTC stack to power real-time Voice AI with low latency, global scale, and seamless conversational turn-taking. 26 vLLM releases dev-tools 2mo ago v0.19.2rc0: [Bugfix] Fix k_proj's bias for GLM-ASR (#40160) Signed-off-by: Rishapveer Singh [email protected] 4 NVIDIA Developer Blog official-blog 3mo ago Maximize AI Infrastructure Throughput by Consolidating Underutilized GPU Workloads In production Kubernetes environments, the difference between model requirements and GPU size creates inefficiencies. Lightweight automatic speech recognition... 38 NVIDIA Developer Blog official-blog 3mo ago Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety Agentic AI is an ecosystem where specialized models work together to handle planning, reasoning, retrieval, and safety guardrailing. As these systems scale,... 37 Smol AI News news-outlet 3mo ago not much happened today **Google** launched **Gemini 3.1 Flash Live**, a realtime voice and vision agent model with **2x longer conversation memory**, supporting **70 languages** and **128k context**. **Mistral AI** released **Voxtral TTS**, a low-latency, open-weight text-to-speech model supporting… 31 Hugging Face official-blog 3mo ago A New Framework for Evaluating Voice Agents (EVA) Back to Articles A New Framework for Evaluating Voice Agents (EVA) Enterprise Article Published March 24, 2026 Upvote 92 Tara Bogavelli tarabogavelli ServiceNow-AI Gabrielle Gauthier Melancon gabegma ServiceNow-AI Katrina Stankiewicz kstankiewicz ServiceNow-AI Nifemi Bamgbose… 7 OpenAI Python SDK releases dev-tools 3mo ago v2.28.0 2.28.0 (2026-03-13) Full Changelog: v2.27.0...v2.28.0 Features api: custom voices ( 50dc060 ) 12 ThursdAI news-outlet 5mo ago 📆 ThursdAI - Jan 22 - Clawdbot deep dive, GLM 4.7 Flash, Anthropic constitution + 3 new TSS models From Weights & Biases - deep dive into Clawdbot, an personal AI assistant that learns and evolves, GLM 4.7 Flash, a bunch of new TTS models and Claude's new constitution! 29 Google DeepMind official-blog 6mo ago Improved Gemini audio models for powerful voice experiences Improved Gemini audio models for powerful voice interactions Share x.com Facebook LinkedIn Mail Bibo Xu Director of Product Management Tara Sainath Distinguished Research Scientist General summary Google enhanced Gemini 2.5 Flash Native Audio for better live voice agents. Expect… 37 Hugging Face official-blog 7mo ago Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks Back to Articles Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks Published November 21, 2025 Update on GitHub Upvote 27 Eric Bezzam bezzam Steven Zheng Steveeeeeeen Eustache Le Bihan eustlb Vaibhav Srivastav reach-vb While everyone (and their… 30 Hugging Face official-blog 8mo ago Voice Cloning with Consent Back to Articles Voice Cloning with Consent Published October 28, 2025 Update on GitHub Upvote 40 Margaret Mitchell meg Lucie-Aimée Kaffee frimelle In this blog post, we introduce the idea of a 'voice consent gate' to support voice cloning with consent. We provide an example… 24 Nonint (James Betker) research 25mo ago GPT-4o I’m very pleased to show the world GPT-4o. I came into the project mid-last year with Alexis Conneau with the goal of scaling up speech models and building an “AudioLM”. We knew we had something special late last year, but I don’t think either of us… 22 Eugene Yan research 27mo ago Building an AI Coach to Help Tame My Monkey Mind Building an AI coach with speech-to-text, text-to-speech, an LLM, and a virtual number. 20 Chip Huyen research 33mo ago Multimodality and Large Multimodal Models (LMMs) For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition). However, natural intelligence is not limited to just a single modality. Humans can read, talk, and… 10 Page 8 of 8 · 365 articles ← Newer