Hugging Face Daily Papers · · 8 min read

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

🎵 Meet Convex Language Detection (CLD)! </p>\n<p>Automatic Speech Recognition (ASR) frequently exhibits failures on accents and dialects. But collecting more data to retrain a larger model is slow and expensive. CLD solves this—not by grid-searching hyperparameters or collecting massive datasets, but through the elegant geometry of convex optimization.</p>\n<p> 🌐🎙️ Instead of relying on unpredictable large-scale neural networks that struggle with accent variance, CLD introduces a lightweight, pluggable detection head that yields mathematically certified margin stability.</p>\n<p>We benchmarked CLD across 5 languages, 24 unique sub-dialects (including highly challenging regimes like Singaporean English and regional Mandarin), and foundational models like Whisper and MMS-1B. The results: Even with under 100 training samples, CLD locks in 97–98% accuracy, reduces cross-lingual decoding failures, and cuts compute costs by a massive 13x. </p>\n<p>The structural shift is fundamentally distinct:<br>Current multilingual ASR models are heavily imbalanced toward standard, high-resource speech datasets, leaving millions of global speakers facing cascading errors. By recasting language identification as a convex program solved via parallelized ADMM in JAX, we don't just guess a boundary—we calculate a verifiable radius of label invariance with guarantees. We see this as a highly scalable, theoretically backed plug-and-play module which aims to bring equity, speed, and reliability to global speech systems. </p>\n<p> 🛠️ Open-Source Code: <a href=\"https://github.com/pilancilab/CLD\" rel=\"nofollow\">https://github.com/pilancilab/CLD</a><br>📦 JAX Package: pip install jaxcld (<a href=\"https://pypi.org/project/jaxcld/\" rel=\"nofollow\">https://pypi.org/project/jaxcld/</a>)<br>📄 Full Paper: <a href=\"https://arxiv.org/abs/2605.23235\" rel=\"nofollow\">https://arxiv.org/abs/2605.23235</a> </p>\n","updatedAt":"2026-05-29T20:40:29.905Z","author":{"_id":"6557086c0bd9dbcb06b1c083","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6557086c0bd9dbcb06b1c083/7iWA_R6uKpPig53I-NXq1.jpeg","fullname":"miria k","name":"miria0","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.8877108097076416},"editors":["miria0"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6557086c0bd9dbcb06b1c083/7iWA_R6uKpPig53I-NXq1.jpeg"],"reactions":[],"isReport":false}},{"id":"6a1a411bb47a5dc3cb12e3d7","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:44:59.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages](https://huggingface.co/papers/2604.16287) (2026)\n* [Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages](https://huggingface.co/papers/2604.09094) (2026)\n* [Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs](https://huggingface.co/papers/2605.12242) (2026)\n* [Linear Semantic Segmentation for Low-Resource Spoken Dialects](https://huggingface.co/papers/2605.06276) (2026)\n* [Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition](https://huggingface.co/papers/2605.13087) (2026)\n* [Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan](https://huggingface.co/papers/2604.11110) (2026)\n* [PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions](https://huggingface.co/papers/2605.17860) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.16287\">NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.09094\">Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.12242\">Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.06276\">Linear Semantic Segmentation for Low-Resource Spoken Dialects</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.13087\">Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11110\">Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.17860\">PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:44:59.525Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7007083296775818},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.23235","authors":[{"_id":"6a18b44256b4bb14ec65cd48","user":{"_id":"6557086c0bd9dbcb06b1c083","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6557086c0bd9dbcb06b1c083/7iWA_R6uKpPig53I-NXq1.jpeg","isPro":false,"fullname":"miria k","user":"miria0","type":"user","name":"miria0"},"name":"Miria Feng","status":"claimed_verified","statusLastChangedAt":"2026-05-29T09:33:26.713Z","hidden":false},{"_id":"6a18b44256b4bb14ec65cd49","name":"William Tan","hidden":false},{"_id":"6a18b44256b4bb14ec65cd4a","name":"Mert Pilanci","hidden":false}],"publishedAt":"2026-05-22T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Convex Low-resource Accent-Robust Language Detection in Speech Recognition","submittedOnDailyBy":{"_id":"6557086c0bd9dbcb06b1c083","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6557086c0bd9dbcb06b1c083/7iWA_R6uKpPig53I-NXq1.jpeg","isPro":false,"fullname":"miria k","user":"miria0","type":"user","name":"miria0"},"summary":"Globalization and multiculturalism continue to produce increasingly diverse speech varieties. Yet current spoken dialogue systems frequently fail on under-represented dialects and accents, often misidentifying the input language and causing cascading failures in downstream dialogue tasks. Addressing this dialectal variance under low-resource constraints remains an open challenge, as standard fine-tuning is computationally expensive and prone to overfitting on high-dimensional speech data. We propose Convex Language Detection (CLD), a novel framework that integrates theoretically grounded convex optimization techniques into the spoken dialogue systems pipeline. Our method is efficiently implemented via multi-GPU Alternating Direction Method of Multipliers (ADMM) in JAX, thus providing global optimality guarantees and fast training in polynomial time. Theoretically, we prove that our convex objective induces certified margin stability and provide guarantees against feature perturbations. Empirically, we demonstrate sample efficiency and robustness to input dialectical variation, achieving 97-98% accuracy in challenging low-resource regimes. Our open-source package is available at https://pypi.org/project/jaxcld/","upvotes":1,"discussionId":"6a18b44256b4bb14ec65cd4b","projectPage":"https://pilancilab.github.io/CLD/","githubRepo":"https://github.com/pilancilab/CLD","githubRepoAddedBy":"user","ai_summary":"A novel convex optimization framework for language detection in spoken dialogue systems that achieves high accuracy with efficient training and theoretical guarantees against dialectal variations under low-resource conditions.","ai_keywords":["convex optimization","alternating direction method of multipliers","ADMM","JAX","language detection","dialectal variance","low-resource regimes","global optimality","margin stability","feature perturbations","sample efficiency"],"githubStars":6,"organization":{"_id":"672c672dcf09d152f4da04c4","name":"StanfordUniversity","fullname":"Stanford University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/vJI0POlzGMXL2878t1vz2.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6557086c0bd9dbcb06b1c083","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6557086c0bd9dbcb06b1c083/7iWA_R6uKpPig53I-NXq1.jpeg","isPro":false,"fullname":"miria k","user":"miria0","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"672c672dcf09d152f4da04c4","name":"StanfordUniversity","fullname":"Stanford University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/vJI0POlzGMXL2878t1vz2.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.23235.md"}">
Papers
arxiv:2605.23235

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Published on May 22
· Submitted by
miria k
on May 29
Authors:
,

Abstract

A novel convex optimization framework for language detection in spoken dialogue systems that achieves high accuracy with efficient training and theoretical guarantees against dialectal variations under low-resource conditions.

AI-generated summary

Globalization and multiculturalism continue to produce increasingly diverse speech varieties. Yet current spoken dialogue systems frequently fail on under-represented dialects and accents, often misidentifying the input language and causing cascading failures in downstream dialogue tasks. Addressing this dialectal variance under low-resource constraints remains an open challenge, as standard fine-tuning is computationally expensive and prone to overfitting on high-dimensional speech data. We propose Convex Language Detection (CLD), a novel framework that integrates theoretically grounded convex optimization techniques into the spoken dialogue systems pipeline. Our method is efficiently implemented via multi-GPU Alternating Direction Method of Multipliers (ADMM) in JAX, thus providing global optimality guarantees and fast training in polynomial time. Theoretically, we prove that our convex objective induces certified margin stability and provide guarantees against feature perturbations. Empirically, we demonstrate sample efficiency and robustness to input dialectical variation, achieving 97-98% accuracy in challenging low-resource regimes. Our open-source package is available at https://pypi.org/project/jaxcld/

Community

Paper author Paper submitter about 19 hours ago
edited about 18 hours ago

🎵 Meet Convex Language Detection (CLD)!

Automatic Speech Recognition (ASR) frequently exhibits failures on accents and dialects. But collecting more data to retrain a larger model is slow and expensive. CLD solves this—not by grid-searching hyperparameters or collecting massive datasets, but through the elegant geometry of convex optimization.

🌐🎙️ Instead of relying on unpredictable large-scale neural networks that struggle with accent variance, CLD introduces a lightweight, pluggable detection head that yields mathematically certified margin stability.

We benchmarked CLD across 5 languages, 24 unique sub-dialects (including highly challenging regimes like Singaporean English and regional Mandarin), and foundational models like Whisper and MMS-1B. The results: Even with under 100 training samples, CLD locks in 97–98% accuracy, reduces cross-lingual decoding failures, and cuts compute costs by a massive 13x.

The structural shift is fundamentally distinct:
Current multilingual ASR models are heavily imbalanced toward standard, high-resource speech datasets, leaving millions of global speakers facing cascading errors. By recasting language identification as a convex program solved via parallelized ADMM in JAX, we don't just guess a boundary—we calculate a verifiable radius of label invariance with guarantees. We see this as a highly scalable, theoretically backed plug-and-play module which aims to bring equity, speed, and reliability to global speech systems.

🛠️ Open-Source Code: https://github.com/pilancilab/CLD
📦 JAX Package: pip install jaxcld (https://pypi.org/project/jaxcld/)
📄 Full Paper: https://arxiv.org/abs/2605.23235

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.23235
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.23235 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.23235 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.23235 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers