Hugging Face Daily Papers · · 5 min read

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

👋 Hi everyone! We’re excited to share our ICML 2026 work <strong>RT-Lynx: Putting GEMM Sparsity in the Right Place for Diffusion Models</strong>.</p>\n<p>Semi-structured sparsity has the potential to nearly halve GEMM FLOPs, but applying it to diffusion models remains challenging: conventional weight sparsification often removes critical generative capacity and causes visible quality degradation.</p>\n<p>We revisit this problem and find that, unlike weights, DiT activations are intrinsically sparse and significantly more robust to 2:4 semi-structured sparsity. This suggests that activation sparsity is a better target than weight sparsity for accelerating Diffusion Transformers.Based on this observation, we propose RT-Lynx, which shifts the sparsification target from weights to activations. It combines online activation sparsification with norm-based compensation and a lightweight LoRA branch to recover fine-grained visual details.To make this practically efficient, we further design optimized CUDA kernels that fuse sparsification, compression, and sparse Tensor Core computation into a unified inference pipeline.</p>\n<p>Across Qwen-Image, FLUX.1-dev, and Z-Image, RT-Lynx preserves generation quality while achieving around 1.2× end-to-end speedup and up to 1.55× average linear-layer acceleration.</p>\n<p>We hope this work highlights activation sparsity as a more suitable and hardware-friendly direction for accelerating modern Diffusion Transformers. Feedback is very welcome!</p>\n","updatedAt":"2026-05-27T02:01:09.782Z","author":{"_id":"68ca7cd1128856f068630e3c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68ca7cd1128856f068630e3c/WFul7VOyvQPjO66UnjGIS.jpeg","fullname":"Xing Cong","name":"BUAAer-xing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8797541856765747},"editors":["BUAAer-xing"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68ca7cd1128856f068630e3c/WFul7VOyvQPjO66UnjGIS.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.26632","authors":[{"_id":"6a164e9ee9aa3c8e322db2ca","user":{"_id":"68ca7cd1128856f068630e3c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68ca7cd1128856f068630e3c/WFul7VOyvQPjO66UnjGIS.jpeg","isPro":false,"fullname":"Xing Cong","user":"BUAAer-xing","type":"user","name":"BUAAer-xing"},"name":"Xing Cong","status":"claimed_verified","statusLastChangedAt":"2026-05-27T07:42:28.061Z","hidden":false},{"_id":"6a164e9ee9aa3c8e322db2cb","name":"Hanlin Tang","hidden":false},{"_id":"6a164e9ee9aa3c8e322db2cc","name":"Kan Liu","hidden":false},{"_id":"6a164e9ee9aa3c8e322db2cd","name":"Lan Tao","hidden":false},{"_id":"6a164e9ee9aa3c8e322db2ce","name":"Lin Qu","hidden":false},{"_id":"6a164e9ee9aa3c8e322db2cf","name":"Chenhao Xie","hidden":false}],"publishedAt":"2026-05-26T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models","submittedOnDailyBy":{"_id":"68ca7cd1128856f068630e3c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68ca7cd1128856f068630e3c/WFul7VOyvQPjO66UnjGIS.jpeg","isPro":false,"fullname":"Xing Cong","user":"BUAAer-xing","type":"user","name":"BUAAer-xing"},"summary":"Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality. Our study, however, shows that DiT activations are intrinsically sparse and significantly more robust to N:M semi-structured sparsification than weights. Motivated by this observation, we advocate a paradigm shift from weight sparsification to activation sparsification. We propose RT-Lynx, which applies N:M sparsification to activations and incorporates error-compensation techniques to mitigate accuracy loss. We further implement highly optimized CUDA kernels tailored to this setting, achieving up to a 1.55x speedup on average in linear layers. Extensive experiments across multiple diffusion models demonstrate that our method preserves the generation quality of the original models while substantially accelerating inference.","upvotes":2,"discussionId":"6a164e9fe9aa3c8e322db2d0","ai_summary":"Diffusion Transformers achieve strong image generation performance but face high inference costs; this work proposes RT-Lynx, which uses activation sparsification and optimized CUDA kernels to accelerate inference while maintaining generation quality.","ai_keywords":["Diffusion Transformers","image generation","inference costs","semi-structured sparsity","N:M sparsification","activation sparsification","error-compensation techniques","CUDA kernels","linear layers","generation quality"],"organization":{"_id":"6948e7d0a2a90d1cca14cbbc","name":"RTP-LLM","fullname":"RTP-LLM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6426d1afbc4f1d51f5479914/lgUmPC4DXPxlhRBDnHybm.webp"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"69a411e6ecb9438cf97774ce","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/XiI0vTBuURgF7e4fYf1xC.png","isPro":false,"fullname":"Xiao Xinyi","user":"chloewright881","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6948e7d0a2a90d1cca14cbbc","name":"RTP-LLM","fullname":"RTP-LLM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6426d1afbc4f1d51f5479914/lgUmPC4DXPxlhRBDnHybm.webp"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.26632.md"}">
Papers
arxiv:2605.26632

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Published on May 26
· Submitted by
Xing Cong
on May 27
Authors:
,
,
,
,

Abstract

Diffusion Transformers achieve strong image generation performance but face high inference costs; this work proposes RT-Lynx, which uses activation sparsification and optimized CUDA kernels to accelerate inference while maintaining generation quality.

AI-generated summary

Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality. Our study, however, shows that DiT activations are intrinsically sparse and significantly more robust to N:M semi-structured sparsification than weights. Motivated by this observation, we advocate a paradigm shift from weight sparsification to activation sparsification. We propose RT-Lynx, which applies N:M sparsification to activations and incorporates error-compensation techniques to mitigate accuracy loss. We further implement highly optimized CUDA kernels tailored to this setting, achieving up to a 1.55x speedup on average in linear layers. Extensive experiments across multiple diffusion models demonstrate that our method preserves the generation quality of the original models while substantially accelerating inference.

Community

Paper author Paper submitter about 9 hours ago

👋 Hi everyone! We’re excited to share our ICML 2026 work RT-Lynx: Putting GEMM Sparsity in the Right Place for Diffusion Models.

Semi-structured sparsity has the potential to nearly halve GEMM FLOPs, but applying it to diffusion models remains challenging: conventional weight sparsification often removes critical generative capacity and causes visible quality degradation.

We revisit this problem and find that, unlike weights, DiT activations are intrinsically sparse and significantly more robust to 2:4 semi-structured sparsity. This suggests that activation sparsity is a better target than weight sparsity for accelerating Diffusion Transformers.Based on this observation, we propose RT-Lynx, which shifts the sparsification target from weights to activations. It combines online activation sparsification with norm-based compensation and a lightweight LoRA branch to recover fine-grained visual details.To make this practically efficient, we further design optimized CUDA kernels that fuse sparsification, compression, and sparse Tensor Core computation into a unified inference pipeline.

Across Qwen-Image, FLUX.1-dev, and Z-Image, RT-Lynx preserves generation quality while achieving around 1.2× end-to-end speedup and up to 1.55× average linear-layer acceleration.

We hope this work highlights activation sparsity as a more suitable and hardware-friendly direction for accelerating modern Diffusion Transformers. Feedback is very welcome!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.26632
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.26632 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.26632 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.26632 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers