Hugging Face Daily Papers · May 13, 2026 · 4 min read

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation</p>\n","updatedAt":"2026-05-13T04:12:42.314Z","author":{"_id":"63483629ac5172169929da0e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665676793089-noauth.jpeg","fullname":"Xin Wen","name":"xwen99","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.678419828414917},"editors":["xwen99"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1665676793089-noauth.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.08734","authors":[{"_id":"6a03fa0d86b054ce2fa40eef","name":"Ziyun Liu","hidden":false},{"_id":"6a03fa0d86b054ce2fa40ef0","name":"Fengmiao Bian","hidden":false},{"_id":"6a03fa0d86b054ce2fa40ef1","name":"Jian-Feng Cai","hidden":false}],"publishedAt":"2026-05-09T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation","submittedOnDailyBy":{"_id":"63483629ac5172169929da0e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665676793089-noauth.jpeg","isPro":false,"fullname":"Xin Wen","user":"xwen99","type":"user","name":"xwen99"},"summary":"Low-Rank Adaptation (LoRA) reparameterizes a weight update as a product of two low-rank factors, but the Jacobian J_{G} of the generator mapping the factors to the weight matrix is rank-deficient, so the factor-space preconditioner J_{G}^* {F}_t J_{G} induced by any {W}-space preconditioner {F}_t is singular, and consequently the standard chain rule cannot be uniquely inverted to map a preconditioned {W}-space direction back to a factor-space update. We cast existing LoRA optimizers in a unified framework parameterized by two choices: (i) which invertible surrogate for J_{G}^* {F}_t J_{G} to use, and (ii) which {F}_t on {W} to use. Existing methods occupy four families along these axes: factor-space adaptive updates, block-diagonal surrogates for J_{G}^* J_{G}, Frobenius-residual pseudoinverse methods, and Riemannian manifold constraint. Within this design space, a gradient-statistics-aware {F}_t paired with a closed-form factor-space solve at {O}((m+n)r) memory remains underexplored. We propose AdaPreLoRA, which fills this gap by adopting the Adafactor diagonal Kronecker preconditioner {H}_t on {W} and selecting from the resulting factor-space solution family the element minimizing an {H}_t-weighted imbalance between the two factor contributions; by construction, the resulting factor update is the closest LoRA approximation to the preconditioned {W}-space direction under the {H}_t-weighted norm. Across GPT-2 (E2E), Mistral-7B and Qwen2-7B (GLUE, ARC, GSM8K), and diffusion-model personalization, AdaPreLoRA is competitive with or improves over a representative set of LoRA optimizers while keeping peak GPU memory at the LoRA optimizer level.","upvotes":2,"discussionId":"6a03fa0d86b054ce2fa40ef2","ai_summary":"LoRA optimizers are analyzed through a unified framework based on surrogate matrices and preconditioners, with AdaPreLoRA proposing a novel approach using Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage.","ai_keywords":["Low-Rank Adaptation","Jacobian","generator mapping","factor-space preconditioner","rank-deficient","chain rule","invertible surrogate","Frobenius-residual pseudoinverse","Riemannian manifold constraint","Adafactor diagonal Kronecker preconditioner","factor-space solution family","LoRA approximation"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63483629ac5172169929da0e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665676793089-noauth.jpeg","isPro":false,"fullname":"Xin Wen","user":"xwen99","type":"user"},{"_id":"67a3a0e6739f52b09e2607e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/bSz835WP2238V7EBPNjgc.png","isPro":false,"fullname":"Ziyun Liu","user":"Ziyunn","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.08734.md"}">

Papers

arxiv:2605.08734

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Published on May 9

· Submitted by

Xin Wen on May 13

Upvote

Authors:

Abstract

LoRA optimizers are analyzed through a unified framework based on surrogate matrices and preconditioners, with AdaPreLoRA proposing a novel approach using Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage.

AI-generated summary

Low-Rank Adaptation (LoRA) reparameterizes a weight update as a product of two low-rank factors, but the Jacobian J_{G} of the generator mapping the factors to the weight matrix is rank-deficient, so the factor-space preconditioner J_{G}^* {F}_t J_{G} induced by any {W}-space preconditioner {F}_t is singular, and consequently the standard chain rule cannot be uniquely inverted to map a preconditioned {W}-space direction back to a factor-space update. We cast existing LoRA optimizers in a unified framework parameterized by two choices: (i) which invertible surrogate for J_{G}^* {F}_t J_{G} to use, and (ii) which {F}_t on {W} to use. Existing methods occupy four families along these axes: factor-space adaptive updates, block-diagonal surrogates for J_{G}^* J_{G}, Frobenius-residual pseudoinverse methods, and Riemannian manifold constraint. Within this design space, a gradient-statistics-aware {F}_t paired with a closed-form factor-space solve at {O}((m+n)r) memory remains underexplored. We propose AdaPreLoRA, which fills this gap by adopting the Adafactor diagonal Kronecker preconditioner {H}_t on {W} and selecting from the resulting factor-space solution family the element minimizing an {H}_t-weighted imbalance between the two factor contributions; by construction, the resulting factor update is the closest LoRA approximation to the preconditioned {W}-space direction under the {H}_t-weighted norm. Across GPT-2 (E2E), Mistral-7B and Qwen2-7B (GLUE, ARC, GSM8K), and diffusion-model personalization, AdaPreLoRA is competitive with or improves over a representative set of LoRA optimizers while keeping peak GPU memory at the LoRA optimizer level.

View arXiv page View PDF Add to collection

Community

xwen99

Paper submitter about 17 hours ago

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.08734

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.08734 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.08734 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.08734 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers