r/LocalLLaMA · · 1 min read

Me train LLM on 8GB from Scratch. Me happy

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I made post yesterday: https://www.reddit.com/r/LocalLLaMA/comments/1tqjuzg/why_is_there_no_community_project_for_training/

i program today:
https://github.com/epoyraz/train-a-model-from-scratch

Highlight:
- train tinystories from scratch with 8GB VRAM. YAY

- mHC no good (too small model)
- BitNet too Slow (no memory gain while training)
- TurboQuant (no need)
- MTP works. YAAAY (but make training slower)

Well .. it's not LLM, it's tiny model 25M: https://huggingface.co/epoyraz/tinystories-25m

submitted by /u/tevlon
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA