r/LocalLLaMA · · 2 min read

MiCA is now part of Hugging Face PEFT

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

MiCA is now part of Hugging Face PEFT

Glad to share that MiCA, short for Minor Component Adaptation, has now been merged into the HuggingFace PEFT library.

It is not yet included in the latest PyPI release, but you can already install it directly from PEFT main:

pip install --upgrade git+https://github.com/huggingface/peft.git@main 

Then using MiCA is minimal:

from peft import LoraConfig, get_peft_model config = LoraConfig( init_lora_weights="mica", r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], task_type="CAUSAL_LM", ) model = get_peft_model(base_model, config) model.print_trainable_parameters() 

That’s it. MiCA is exposed through the existing LoRA interface via:

init_lora_weights="mica" 

The idea behind MiCA is simple: instead of adapting along the dominant singular directions of a pretrained weight matrix, MiCA uses the minor singular subspace.

For a weight matrix:

W = U Σ Vᵀ

MiCA initializes:

B = U[:, -r:]
A = 0

So the adapter starts as a no-op, because B A = 0

The base model output is preserved exactly at initialization. During training, MiCA keeps B frozen and only trains A.

Why is this useful?

The intuition is that the major singular directions already encode much of the pre-trained model’s existing behavior. The minor directions are less used by the original model and may provide a more plastic subspace for injecting new knowledge.

In our experiments, MiCA showed in average over two experiments and three models:

  • about 90% higher knowledge uptake on average
  • about 20% less catastrophic forgetting
  • about 80% fewer trainable parameters compared with LoRA in the tested setup

See the paper for the full experimental details.

A practical rule of thumb:

If you have a LoRA setup that works well, try MiCA with:

r_mica ≈ r_lora / 2
learning_rate_mica ≈ 2 × learning_rate_lora

Because MiCA trains only one of the two LoRA matrices, you often need fewer parameters and can use a somewhat higher learning rate.

Best practice:

MiCA is mainly intended for continued pretraining / domain-adaptive pretraining.

A recommended workflow is:

  1. Start from the base model, not the instruct/chat model.
  2. Train the MiCA adapter on domain text.
  3. Merge the adapter into the model.
  4. Use the merged model as the adapted base for later instruction/chat tuning.

In many cases, merging or transferring the adapter into the corresponding instruct/chat model can work better; see the MiCA paper for details.

We tested MiCA primarily for continued pretraining and supervised fine-tuning. Early RL results look promising. Instruction fine-tuning alone was not the most useful setting in our experiments.

Huge thanks to Sebastian Raschka for the collaboration, and to the Hugging Face team (Lewis Tunstal and Benjamin Bossan) for review and integration.

Preprint: https://arxiv.org/abs/2604.01694

https://preview.redd.it/rbqi05lrb6ah1.png?width=1672&format=png&auto=webp&s=0f62e0f43b3926eb6ef0079fcd1fe4af38f1b831

submitted by /u/Majestic-Explorer315
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA