Hugging Face Daily Papers · June 18, 2026 · 4 min read

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

<a href=\"https://cdn-uploads.huggingface.co/production/uploads/655fc82eb3c95b72253264e1/mHheleYL0WpFi-VE6utPp.jpeg\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/655fc82eb3c95b72253264e1/mHheleYL0WpFi-VE6utPp.jpeg\" alt=\"hilo-token\"></a></p>\n","updatedAt":"2026-06-18T18:38:18.854Z","author":{"_id":"655fc82eb3c95b72253264e1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655fc82eb3c95b72253264e1/EzimumET_Io5lo7e_xW_F.jpeg","fullname":"Haoran You","name":"hryou1998","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.1938728243112564},"editors":["hryou1998"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/655fc82eb3c95b72253264e1/EzimumET_Io5lo7e_xW_F.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.13898","authors":[{"_id":"6a343a4c4c5c5e0d69bf1ac8","name":"Haoran You","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1ac9","name":"Yotam Nitzan","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1aca","name":"Lingzhi Zhang","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1acb","name":"Yifan Gong","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1acc","name":"Mang-Tik Chiu","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1acd","name":"Connelly Barnes","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1ace","name":"Yan Kang","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1acf","name":"Yuqian Zhou","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1ad0","name":"Eli Shechtman","hidden":false},{"_id":"6a343a4c4c5c5e0d69bf1ad1","name":"Sohrab Amirghodsi","hidden":false}],"publishedAt":"2026-06-11T00:00:00.000Z","submittedOnDailyAt":"2026-06-18T00:00:00.000Z","title":"HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing","submittedOnDailyBy":{"_id":"655fc82eb3c95b72253264e1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655fc82eb3c95b72253264e1/EzimumET_Io5lo7e_xW_F.jpeg","isPro":false,"fullname":"Haoran You","user":"hryou1998","type":"user","name":"hryou1998"},"summary":"Creative image editing tools, such as Photoshop's Remove or Generative Fill buttons, are central to everyday customer use and account for a major share of traffic in Photoshop and Lightroom. However, current generative AI models face significant latency challenges, which become even more pronounced when transitioning from convolution-based U-Nets to Diffusion Transformers (DiTs). In our evaluation on hundreds of representative image editing samples spanning a wide range of mask ratios, the DiT module alone accounts for an average of 73% of the total model latency, even after being distilled from 50 timesteps down to 8 timesteps. To tackle this challenge, we propose HiLo-Token, an input-adaptive token compression framework that allocates more token budget to high-frequency, rich-context regions while assigning fewer tokens to low-frequency areas. Specifically, for the editing region specified by the user mask, we retain all tokens within a dilated mask to preserve strong locality and contextual relevance. Outside the editing region, we introduce a simple yet effective high-frequency token selection strategy based on spatial frequency to capture important local details, while using tokens from a 16x downsampled image to represent low-frequency components and preserve the blurry but global structure. Extensive experiments on production-level evaluation data validate the effectiveness of the proposed method, achieving 3.13x, 2.59x, and 1.67x DiT speedups on A100-80GB for image editing tasks across small, medium, and large mask ratio categories with average ratios of 6.38%, 15.92%, and 35.36%, respectively, without any regression in generation quality.","upvotes":1,"discussionId":"6a343a4c4c5c5e0d69bf1ad2","ai_summary":"A novel token compression framework called HiLo-Token is introduced to accelerate Diffusion Transformers in image editing tasks by adaptively allocating tokens based on spatial frequency and context importance, achieving significant speedups without quality loss.","ai_keywords":["Diffusion Transformers","DiT","token compression","spatial frequency","high-frequency token selection","low-frequency components","image editing","generative AI","latency optimization","token budget"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"61e5d14f77496de0a6d95c6b","name":"adobe","fullname":"Adobe","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1645217431826-61e35e517ac6b6d06cfa8081.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"655fc82eb3c95b72253264e1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655fc82eb3c95b72253264e1/EzimumET_Io5lo7e_xW_F.jpeg","isPro":false,"fullname":"Haoran You","user":"hryou1998","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"61e5d14f77496de0a6d95c6b","name":"adobe","fullname":"Adobe","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1645217431826-61e35e517ac6b6d06cfa8081.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.13898.md","query":{}}">

Papers

arxiv:2606.13898

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

Published on Jun 11

· Submitted by

Haoran You on Jun 18

Adobe

Upvote

Authors:

Abstract

A novel token compression framework called HiLo-Token is introduced to accelerate Diffusion Transformers in image editing tasks by adaptively allocating tokens based on spatial frequency and context importance, achieving significant speedups without quality loss.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Creative image editing tools, such as Photoshop's Remove or Generative Fill buttons, are central to everyday customer use and account for a major share of traffic in Photoshop and Lightroom. However, current generative AI models face significant latency challenges, which become even more pronounced when transitioning from convolution-based U-Nets to Diffusion Transformers (DiTs). In our evaluation on hundreds of representative image editing samples spanning a wide range of mask ratios, the DiT module alone accounts for an average of 73% of the total model latency, even after being distilled from 50 timesteps down to 8 timesteps. To tackle this challenge, we propose HiLo-Token, an input-adaptive token compression framework that allocates more token budget to high-frequency, rich-context regions while assigning fewer tokens to low-frequency areas. Specifically, for the editing region specified by the user mask, we retain all tokens within a dilated mask to preserve strong locality and contextual relevance. Outside the editing region, we introduce a simple yet effective high-frequency token selection strategy based on spatial frequency to capture important local details, while using tokens from a 16x downsampled image to represent low-frequency components and preserve the blurry but global structure. Extensive experiments on production-level evaluation data validate the effectiveness of the proposed method, achieving 3.13x, 2.59x, and 1.67x DiT speedups on A100-80GB for image editing tasks across small, medium, and large mask ratio categories with average ratios of 6.38%, 15.92%, and 35.36%, respectively, without any regression in generation quality.

View arXiv page View PDF Add to collection

Community

hryou1998

Paper submitter about 2 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.13898

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.13898 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.13898 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.13898 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers