Dropping learning rate fixed my Qlora fine-tune more than anything else i tried
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data. Tried cleaning the dataset, tried different prompt templates, messed with rank and alpha. Nothing realy changed.
Dropped the learning rate from 2e-4 to 1e-4 and bumped epochs from 3 to 5. Ran it on a 5090 I rent on Hyperai since our lab machines are always booked. Completley different results. Same data, same everything else.
2e-4 is just too agressive when your dataset is that small. The model overfits in the first epoch and then just goes in circles for the rest of training. Lower lr gave it more room to converge without blowing past everything.
Also ended up cutting about a third of my dataset, mostly mislabeled and ambiguous stuff. Eval got better with less data which yeah yeah everyone says that but its different when you see the numbers yourself lol
2e-4 is the default everywhere and i dont think it works well below a certain size.
[link] [comments]
More from r/LocalLLaMA
-
Got local Qwen 3.5/3.6 generating meeting summaries entirely offline on an M4 Max. Demo with Wi-Fi off. This is the future.
May 14
-
NVFP4 Kimi2.6 and Kimi 2.5 released by Nvidia
May 14
-
Scenema Audio: Zero-shot expressive voice cloning and speech generation
May 14
-
[Benchmark] 5090RTX: Promt Parsing, Token Generation and Power Level
May 14
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.