Hugging Face Daily Papers · June 23, 2026 · 5 min read

Exploring the Design Space of Reward Backpropagation for Flow Matching

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We introduce FlowBP, a unified framework for direct reward backpropagation in flow-matching models. Instead of treating the backward pass as a fixed approximation, FlowBP makes the surrogate backward trajectory itself a design object, exposing four key choices: reward-model input, active denoising steps, integration weights, and bridge coupling.\nThis perspective unifies prior methods such as ReFL, DRaFT-LV, DRTune, and LeapAlign, while motivating three new variants: FlowBP-Sparse, FlowBP-Bridge, and FlowBP-Lagrange. These methods keep memory proportional to the number of active steps and limit gradient chaining to at most one Jacobian factor, avoiding full-trajectory activation storage and unstable long Jacobian products.\nExperiments on SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base show consistent improvements across preference, image-quality, and compositional-generation metrics, highlighting surrogate-trajectory design as an important direction for efficient flow-model alignment.\n","updatedAt":"2026-06-23T02:27:38.374Z","author":{"_id":"661f9fbc3398ce8499865cd3","avatarUrl":"/avatars/f67fa075ceff5fd827202620b3de468a.svg","fullname":"Ruoyu","name":"cheese1","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.836365818977356},"editors":["cheese1"],"editorAvatarUrls":["/avatars/f67fa075ceff5fd827202620b3de468a.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.11075","authors":[{"_id":"6a2913b7e7d78ea7587e5681","user":{"_id":"661f9fbc3398ce8499865cd3","avatarUrl":"/avatars/f67fa075ceff5fd827202620b3de468a.svg","isPro":false,"fullname":"Ruoyu","user":"cheese1","type":"user","name":"cheese1"},"name":"Ruoyu Wang","status":"claimed_verified","statusLastChangedAt":"2026-06-11T08:40:48.904Z","hidden":false},{"_id":"6a2913b7e7d78ea7587e5682","name":"Boye Niu","hidden":false},{"_id":"6a2913b7e7d78ea7587e5683","name":"Xiangxin Zhou","hidden":false},{"_id":"6a2913b7e7d78ea7587e5684","user":{"_id":"64b500fdf460afaefc5c64b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b500fdf460afaefc5c64b3/bYYyCXHTPUhsfw1HcPRPP.webp","isPro":false,"fullname":"Yushi Huang","user":"Harahan","type":"user","name":"Harahan"},"name":"Yushi Huang","status":"claimed_verified","statusLastChangedAt":"2026-06-12T07:43:26.631Z","hidden":false},{"_id":"6a2913b7e7d78ea7587e5685","name":"Tongliang Liu","hidden":false},{"_id":"6a2913b7e7d78ea7587e5686","name":"Chi Zhang","hidden":false}],"publishedAt":"2026-06-09T16:36:54.000Z","submittedOnDailyAt":"2026-06-23T00:00:00.000Z","title":"Exploring the Design Space of Reward Backpropagation for Flow Matching","submittedOnDailyBy":{"_id":"661f9fbc3398ce8499865cd3","avatarUrl":"/avatars/f67fa075ceff5fd827202620b3de468a.svg","isPro":false,"fullname":"Ruoyu","user":"cheese1","type":"user","name":"cheese1"},"summary":"Aligning text-to-image flow matching models with human preferences via direct reward backpropagation is sample-efficient but hampered by two well-known pathologies: activations cannot be stored across the full sampling trajectory at modern model scale, and chained Jacobian products across steps inflate the reward gradient as it travels back to early indices. Connector-based methods, such as LeapAlign, address these issues by replacing the full backward trajectory with a short pinned path, highlighting a useful decoupling between sampling and optimization. However, the quality of the resulting gradient depends on how accurately this short path approximates the full rollout, especially over long intervals. We propose FlowBP, a unified surrogate-trajectory framework that treats the backward trajectory itself as the design object. FlowBP keeps a no-gradient cached rollout for sampling, then builds a lightweight backward surrogate from cached and selectively re-forwarded velocities. This view separates four choices: the reward-model input, active set, integration weights, and bridge coupling, and recovers prior direct-gradient methods as particular settings. Within this framework, we instantiate three variants: FlowBP-Sparse uses sparse Euler reconstruction, FlowBP-Bridge adds controlled bridge coupling, and FlowBP-Lagrange raises the order of leap quadrature. All three bound memory by the active-set size and limit gradient chaining to at most one Jacobian factor. Across SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base on preference, quality, and compositional metrics, the three variants improve over direct-gradient baselines on most metrics.","upvotes":8,"discussionId":"6a2913b7e7d78ea7587e5687","githubRepo":"https://github.com/RuoyuWang-2077/FlowBP","githubRepoAddedBy":"user","ai_summary":"FlowBP addresses limitations in flow matching model alignment by using a surrogate trajectory framework that reduces memory usage and gradient chaining while maintaining performance across multiple text-to-image models.","ai_keywords":["flow matching models","direct reward backpropagation","Jacobian products","backward trajectory","surrogate trajectory","cached rollout","velocity","integration weights","bridge coupling","leap quadrature"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":8,"organization":{"_id":"66543b6e420092799d2f625c","name":"tencent","fullname":"Tencent","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/Lp3m-XLpjQGwBItlvn69q.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"661f9fbc3398ce8499865cd3","avatarUrl":"/avatars/f67fa075ceff5fd827202620b3de468a.svg","isPro":false,"fullname":"Ruoyu","user":"cheese1","type":"user"},{"_id":"675acaa3e21ed19ca522973e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/t0PqwfaFbVQ9R-cr_Lzzs.png","isPro":false,"fullname":"Fangyu","user":"rslinfy","type":"user"},{"_id":"68e0437c028eec90b4fb1d16","avatarUrl":"/avatars/cbb4c94f2f7d8e95ca2a8762f32b71be.svg","isPro":false,"fullname":"cancan_y","user":"cancan0405","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"6a39f970122573bf1f973f76","avatarUrl":"/avatars/657a6380980d6ff26b01675171f492cd.svg","isPro":false,"fullname":"Yuanlong Zhao","user":"Photoong","type":"user"},{"_id":"68bec93e1b52b9da186ebbe9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/lihQzhe-7rCmCIWBxjjoM.png","isPro":false,"fullname":"Tong Zhao","user":"zttzzt","type":"user"},{"_id":"64103f66928400b4164308f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64103f66928400b4164308f0/5ZikDcmC1qBCWEP6YJeCx.jpeg","isPro":false,"fullname":"Uday Allu","user":"udayallu","type":"user"},{"_id":"69ffa09cf204ce9d1821803a","avatarUrl":"/avatars/caca3cc632dc36ba8babc753d81cb4fd.svg","isPro":false,"fullname":"Peter Anthony Palmtree","user":"peteranthonypalmtree","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"66543b6e420092799d2f625c","name":"tencent","fullname":"Tencent","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/Lp3m-XLpjQGwBItlvn69q.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.11075.md","query":{}}">

Papers

arxiv:2606.11075

Exploring the Design Space of Reward Backpropagation for Flow Matching

Published on Jun 9

· Submitted by

Ruoyu on Jun 23

Tencent

Upvote

Authors:

Ruoyu Wang ,

Yushi Huang ,

Abstract

FlowBP addresses limitations in flow matching model alignment by using a surrogate trajectory framework that reduces memory usage and gradient chaining while maintaining performance across multiple text-to-image models.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Aligning text-to-image flow matching models with human preferences via direct reward backpropagation is sample-efficient but hampered by two well-known pathologies: activations cannot be stored across the full sampling trajectory at modern model scale, and chained Jacobian products across steps inflate the reward gradient as it travels back to early indices. Connector-based methods, such as LeapAlign, address these issues by replacing the full backward trajectory with a short pinned path, highlighting a useful decoupling between sampling and optimization. However, the quality of the resulting gradient depends on how accurately this short path approximates the full rollout, especially over long intervals. We propose FlowBP, a unified surrogate-trajectory framework that treats the backward trajectory itself as the design object. FlowBP keeps a no-gradient cached rollout for sampling, then builds a lightweight backward surrogate from cached and selectively re-forwarded velocities. This view separates four choices: the reward-model input, active set, integration weights, and bridge coupling, and recovers prior direct-gradient methods as particular settings. Within this framework, we instantiate three variants: FlowBP-Sparse uses sparse Euler reconstruction, FlowBP-Bridge adds controlled bridge coupling, and FlowBP-Lagrange raises the order of leap quadrature. All three bound memory by the active-set size and limit gradient chaining to at most one Jacobian factor. Across SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base on preference, quality, and compositional metrics, the three variants improve over direct-gradient baselines on most metrics.

View arXiv page View PDF GitHub 8 Add to collection

Community

cheese1

Paper author Paper submitter about 23 hours ago

We introduce FlowBP, a unified framework for direct reward backpropagation in flow-matching models. Instead of treating the backward pass as a fixed approximation, FlowBP makes the surrogate backward trajectory itself a design object, exposing four key choices: reward-model input, active denoising steps, integration weights, and bridge coupling.

This perspective unifies prior methods such as ReFL, DRaFT-LV, DRTune, and LeapAlign, while motivating three new variants: FlowBP-Sparse, FlowBP-Bridge, and FlowBP-Lagrange. These methods keep memory proportional to the number of active steps and limit gradient chaining to at most one Jacobian factor, avoiding full-trajectory activation storage and unstable long Jacobian products.

Experiments on SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base show consistent improvements across preference, image-quality, and compositional-generation metrics, highlighting surrogate-trajectory design as an important direction for efficient flow-model alignment.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.11075

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.11075 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.11075 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.11075 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Exploring the Design Space of Reward Backpropagation for Flow Matching

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers