Hugging Face Daily Papers · · 4 min read

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation</p>\n","updatedAt":"2026-05-13T04:12:42.314Z","author":{"_id":"63483629ac5172169929da0e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665676793089-noauth.jpeg","fullname":"Xin Wen","name":"xwen99","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.678419828414917},"editors":["xwen99"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1665676793089-noauth.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.08734","authors":[{"_id":"6a03fa0d86b054ce2fa40eef","name":"Ziyun Liu","hidden":false},{"_id":"6a03fa0d86b054ce2fa40ef0","name":"Fengmiao Bian","hidden":false},{"_id":"6a03fa0d86b054ce2fa40ef1","name":"Jian-Feng Cai","hidden":false}],"publishedAt":"2026-05-09T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation","submittedOnDailyBy":{"_id":"63483629ac5172169929da0e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665676793089-noauth.jpeg","isPro":false,"fullname":"Xin Wen","user":"xwen99","type":"user","name":"xwen99"},"summary":"Low-Rank Adaptation (LoRA) reparameterizes a weight update as a product of two low-rank factors, but the Jacobian J_{G} of the generator mapping the factors to the weight matrix is rank-deficient, so the factor-space preconditioner J_{G}^* {F}_t J_{G} induced by any {W}-space preconditioner {F}_t is singular, and consequently the standard chain rule cannot be uniquely inverted to map a preconditioned {W}-space direction back to a factor-space update. We cast existing LoRA optimizers in a unified framework parameterized by two choices: (i) which invertible surrogate for J_{G}^* {F}_t J_{G} to use, and (ii) which {F}_t on {W} to use. Existing methods occupy four families along these axes: factor-space adaptive updates, block-diagonal surrogates for J_{G}^* J_{G}, Frobenius-residual pseudoinverse methods, and Riemannian manifold constraint. Within this design space, a gradient-statistics-aware {F}_t paired with a closed-form factor-space solve at {O}((m+n)r) memory remains underexplored. We propose AdaPreLoRA, which fills this gap by adopting the Adafactor diagonal Kronecker preconditioner {H}_t on {W} and selecting from the resulting factor-space solution family the element minimizing an {H}_t-weighted imbalance between the two factor contributions; by construction, the resulting factor update is the closest LoRA approximation to the preconditioned {W}-space direction under the {H}_t-weighted norm. Across GPT-2 (E2E), Mistral-7B and Qwen2-7B (GLUE, ARC, GSM8K), and diffusion-model personalization, AdaPreLoRA is competitive with or improves over a representative set of LoRA optimizers while keeping peak GPU memory at the LoRA optimizer level.","upvotes":2,"discussionId":"6a03fa0d86b054ce2fa40ef2","ai_summary":"LoRA optimizers are analyzed through a unified framework based on surrogate matrices and preconditioners, with AdaPreLoRA proposing a novel approach using Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage.","ai_keywords":["Low-Rank Adaptation","Jacobian","generator mapping","factor-space preconditioner","rank-deficient","chain rule","invertible surrogate","Frobenius-residual pseudoinverse","Riemannian manifold constraint","Adafactor diagonal Kronecker preconditioner","factor-space solution family","LoRA approximation"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63483629ac5172169929da0e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665676793089-noauth.jpeg","isPro":false,"fullname":"Xin Wen","user":"xwen99","type":"user"},{"_id":"67a3a0e6739f52b09e2607e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/bSz835WP2238V7EBPNjgc.png","isPro":false,"fullname":"Ziyun Liu","user":"Ziyunn","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.08734.md"}">
Papers
arxiv:2605.08734

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Published on May 9
· Submitted by
Xin Wen
on May 13
Authors:
,
,

Abstract

LoRA optimizers are analyzed through a unified framework based on surrogate matrices and preconditioners, with AdaPreLoRA proposing a novel approach using Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage.

AI-generated summary

Low-Rank Adaptation (LoRA) reparameterizes a weight update as a product of two low-rank factors, but the Jacobian J_{G} of the generator mapping the factors to the weight matrix is rank-deficient, so the factor-space preconditioner J_{G}^* {F}_t J_{G} induced by any {W}-space preconditioner {F}_t is singular, and consequently the standard chain rule cannot be uniquely inverted to map a preconditioned {W}-space direction back to a factor-space update. We cast existing LoRA optimizers in a unified framework parameterized by two choices: (i) which invertible surrogate for J_{G}^* {F}_t J_{G} to use, and (ii) which {F}_t on {W} to use. Existing methods occupy four families along these axes: factor-space adaptive updates, block-diagonal surrogates for J_{G}^* J_{G}, Frobenius-residual pseudoinverse methods, and Riemannian manifold constraint. Within this design space, a gradient-statistics-aware {F}_t paired with a closed-form factor-space solve at {O}((m+n)r) memory remains underexplored. We propose AdaPreLoRA, which fills this gap by adopting the Adafactor diagonal Kronecker preconditioner {H}_t on {W} and selecting from the resulting factor-space solution family the element minimizing an {H}_t-weighted imbalance between the two factor contributions; by construction, the resulting factor update is the closest LoRA approximation to the preconditioned {W}-space direction under the {H}_t-weighted norm. Across GPT-2 (E2E), Mistral-7B and Qwen2-7B (GLUE, ARC, GSM8K), and diffusion-model personalization, AdaPreLoRA is competitive with or improves over a representative set of LoRA optimizers while keeping peak GPU memory at the LoRA optimizer level.

Community

Paper submitter about 17 hours ago

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.08734
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.08734 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.08734 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.08734 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers