Want to build a custom model
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I've been toying with the idea of building my own model. At this point, the architecture and training pipeline seem fairly well established, and I'm feeling reasonably confident that I could put together a small model from scratch.
Hardware is obviously the limiting factor. I've only got 32 GB of VRAM, so this clearly isn't going to be some flagship foundation model. It may not even end up particularly useful for general tasks, but it sounds like a fun project and a good learning experience.
My current thought is to avoid full chat responses entirely and instead build a small autocomplete model, probably somewhere around 25M parameters. The goal would simply be: given context, predict the next token, sentence, or paragraph.
The biggest challenge seems to be data. My understanding is that a rough rule of thumb is training on several times the parameter count in tokens, so even a 25M parameter model would ideally want on the order of 100M+ tokens for experimentation.
For a first run, I was considering something more specialized or entertaining. One idea was a comedy model trained on cleaned transcripts fron YouTube to learn setup-to-punchline continuation patterns. Another more boring possibility would be a technical model focused on Python, Linux, or cybersecurity.
For those of you who've trained small models before: where are you finding high-quality datasets? beyond the obvious choices like Wikipedia, Common Crawl derivatives, or synthetic data generated by frontier models? Also curious how people are formatting data for autocomplete-style training versus chat or Q&A datasets.
[link] [comments]
More from r/LocalLLaMA
-
Well.. it's a step up from nonstop bot spam I guess
Jun 30
-
Qwen 3.6 27B Speculative Decoding Bench: Pushing ~100 TPS on a single RTX 3090
Jun 30
-
Meta secretly tested ChatGPT, Gemini, and Character.AI with thousands of minor-perspective crisis prompts
Jun 30
-
Huawei open-sources OpenPangu-2.0-Flash - 92B total,6B active
Jun 30
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.