EMA on LoRA ? [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Hi guys
Does anyone know of papers where EMA on LoRA adapters has been used successfully?
Im interested in cases where the EMA adapter acts as a self-teacher generating soft labels for the trainable adapter.
On-policy self-distillation [1] uses ema for the teacher. However, they seem to fully fine-tune. Any empirical results showing the idea is working on lora/ left models?
[link] [comments]
More from r/MachineLearning
-
Loss functions in Instance Representation Learning [R]
Jun 29
-
Price elasticity model [R]
Jun 29
-
Rejected MICCAI paper: workshop -> journal/conference or directly journal/conference [R]
Jun 29
-
I built a demo agricultural planning system with an AI advisor for small-scale farmers in Nicaragua using NASA data [p]
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.