Me train LLM on 8GB from Scratch. Me happy
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I made post yesterday: https://www.reddit.com/r/LocalLLaMA/comments/1tqjuzg/why_is_there_no_community_project_for_training/
i program today:
https://github.com/epoyraz/train-a-model-from-scratch
Highlight:
- train tinystories from scratch with 8GB VRAM. YAY
- mHC no good (too small model)
- BitNet too Slow (no memory gain while training)
- TurboQuant (no need)
- MTP works. YAAAY (but make training slower)
Well .. it's not LLM, it's tiny model 25M: https://huggingface.co/epoyraz/tinystories-25m
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.