MiCA is now part of Hugging Face PEFT
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Glad to share that MiCA, short for Minor Component Adaptation, has now been merged into the HuggingFace PEFT library. It is not yet included in the latest PyPI release, but you can already install it directly from PEFT main: Then using MiCA is minimal: That’s it. MiCA is exposed through the existing LoRA interface via: The idea behind MiCA is simple: instead of adapting along the dominant singular directions of a pretrained weight matrix, MiCA uses the minor singular subspace. For a weight matrix: W = U Σ Vᵀ MiCA initializes: B = U[:, -r:] So the adapter starts as a no-op, because B A = 0 The base model output is preserved exactly at initialization. During training, MiCA keeps B frozen and only trains A. Why is this useful? The intuition is that the major singular directions already encode much of the pre-trained model’s existing behavior. The minor directions are less used by the original model and may provide a more plastic subspace for injecting new knowledge. In our experiments, MiCA showed in average over two experiments and three models:
See the paper for the full experimental details. A practical rule of thumb: If you have a LoRA setup that works well, try MiCA with: r_mica ≈ r_lora / 2 Because MiCA trains only one of the two LoRA matrices, you often need fewer parameters and can use a somewhat higher learning rate. Best practice: MiCA is mainly intended for continued pretraining / domain-adaptive pretraining. A recommended workflow is:
In many cases, merging or transferring the adapter into the corresponding instruct/chat model can work better; see the MiCA paper for details. We tested MiCA primarily for continued pretraining and supervised fine-tuning. Early RL results look promising. Instruction fine-tuning alone was not the most useful setting in our experiments. Huge thanks to Sebastian Raschka for the collaboration, and to the Hugging Face team (Lewis Tunstal and Benjamin Bossan) for review and integration. Preprint: https://arxiv.org/abs/2604.01694 [link] [comments] |
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.