Hugging Face Daily Papers · June 5, 2026 · 4 min read

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We introduce SpeechEditBench, a bilingual benchmark for instruction-guided speech editing. It covers seven atomic editing tasks—content, speaker, emotion, style, prosody, paralinguistic, and acoustic editing—as well as compositional editing with multiple instructions in one sample. The benchmark uses anchor-based evaluation to separately measure target success, content preservation, and joint success, revealing where current Speech LLMs fail beyond target-only metrics.\nCode and data are available at: <a href=\"https://github.com/daxintan-cuhk/SpeechEditBench\" rel=\"nofollow\">https://github.com/daxintan-cuhk/SpeechEditBench</a> <a href=\"https://huggingface.co/datasets/DiscreteSpeech/SpeechEditBench\">https://huggingface.co/datasets/DiscreteSpeech/SpeechEditBench</a>\n","updatedAt":"2026-06-05T00:23:26.506Z","author":{"_id":"6458b6508aa54fb020f5c66c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6458b6508aa54fb020f5c66c/1zlrnOOTBhZBznb1o9qLj.jpeg","fullname":"DarcyTan","name":"Darcy0123","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.800849199295044},"editors":["Darcy0123"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6458b6508aa54fb020f5c66c/1zlrnOOTBhZBznb1o9qLj.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.01804","authors":[{"_id":"6a2216393490a593e87b11ca","name":"Hanlin Zhang","hidden":false},{"_id":"6a2216393490a593e87b11cb","name":"Daxin Tan","hidden":false},{"_id":"6a2216393490a593e87b11cc","name":"Dehua Tao","hidden":false},{"_id":"6a2216393490a593e87b11cd","name":"Xiao Chen","hidden":false},{"_id":"6a2216393490a593e87b11ce","name":"Haochen Tan","hidden":false},{"_id":"6a2216393490a593e87b11cf","name":"Linqi Song","hidden":false}],"publishedAt":"2026-06-03T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing","submittedOnDailyBy":{"_id":"6458b6508aa54fb020f5c66c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6458b6508aa54fb020f5c66c/1zlrnOOTBhZBznb1o9qLj.jpeg","isPro":false,"fullname":"DarcyTan","user":"Darcy0123","type":"user","name":"Darcy0123"},"summary":"Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce SpeechEditBench, a bilingual multi-attribute benchmark for instruction-guided speech editing. SpeechEditBench encompasses seven atomic editing tasks, as well as compositional editing tasks that integrate multiple operations within a single instruction. We propose an anchor-based evaluation protocol that separately assesses the edit success of target attributes and the preservation of untargeted attributes, leading to three metrics: target success, preservation success, and joint success. Using this benchmark, we evaluate mainstream Speech LLMs and specialized speech editing systems. The results reveal three key findings: (1) no single model performs well across all editing dimensions; (2) closed-source Speech LLMs generally outperform open-source models; (3) compositional editing remains highly challenging, with even the most advanced models struggling to achieve high joint success. SpeechEditBench provides a rigorous diagnostic framework to identify bottlenecks in Speech LLMs, thereby facilitating the development of next-generation Speech LLMs with more robust and precise instruction-guided editing capabilities. Data and code are avaialble at https://github.com/daxintan-cuhk/SpeechEditBench .","upvotes":0,"discussionId":"6a22163a3490a593e87b11d0","githubRepo":"https://github.com/daxintan-cuhk/SpeechEditBench","githubRepoAddedBy":"user","ai_summary":"A bilingual multi-attribute benchmark for instruction-guided speech editing is introduced to systematically evaluate speech modification capabilities across atomic and compositional tasks.","ai_keywords":["Speech Large Language Models","speech editing","instruction-guided editing","bilingual benchmark","atomic editing tasks","compositional editing tasks","anchor-based evaluation protocol","target success","preservation success","joint success"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1,"organization":{"_id":"5f83c275f0801648bf88454a","name":"huawei-noah","fullname":"HUAWEI Noah's Ark Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1602470452594-5f83c19ff0801648bf884549.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"5f83c275f0801648bf88454a","name":"huawei-noah","fullname":"HUAWEI Noah's Ark Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1602470452594-5f83c19ff0801648bf884549.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.01804.md"}">

Papers

arxiv:2606.01804

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Published on Jun 3

· Submitted by

DarcyTan on Jun 4

HUAWEI Noah's Ark Lab

Upvote

Authors:

Abstract

A bilingual multi-attribute benchmark for instruction-guided speech editing is introduced to systematically evaluate speech modification capabilities across atomic and compositional tasks.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce SpeechEditBench, a bilingual multi-attribute benchmark for instruction-guided speech editing. SpeechEditBench encompasses seven atomic editing tasks, as well as compositional editing tasks that integrate multiple operations within a single instruction. We propose an anchor-based evaluation protocol that separately assesses the edit success of target attributes and the preservation of untargeted attributes, leading to three metrics: target success, preservation success, and joint success. Using this benchmark, we evaluate mainstream Speech LLMs and specialized speech editing systems. The results reveal three key findings: (1) no single model performs well across all editing dimensions; (2) closed-source Speech LLMs generally outperform open-source models; (3) compositional editing remains highly challenging, with even the most advanced models struggling to achieve high joint success. SpeechEditBench provides a rigorous diagnostic framework to identify bottlenecks in Speech LLMs, thereby facilitating the development of next-generation Speech LLMs with more robust and precise instruction-guided editing capabilities. Data and code are avaialble at https://github.com/daxintan-cuhk/SpeechEditBench .

View arXiv page View PDF GitHub 1 Add to collection

Community

Darcy0123

Paper submitter about 2 hours ago

Code and data are available at:
https://github.com/daxintan-cuhk/SpeechEditBench
https://huggingface.co/datasets/DiscreteSpeech/SpeechEditBench

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.01804

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.01804 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.01804 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Abstract

Community

Models citing this paper 0

Datasets citing this paper 1

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers