r/LocalLLaMA · · 1 min read

LocalLLaMA crowdsourced coding dataset

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I feel like many people in this community (myself included) are constantly, eagerly awaiting new small model releases, or improvements to existing models, etc. Sometimes I wish there were more community-released models (similarly to how there are sometimes community-released harnesses, or frontends, or quants).

Unfortunately, training a new model from scratch is a monstrous task which we simply don't have the expertise or resources for.

However, there is another alterative - ANYBODY, with ANY hardware, can contribute to a dataset. If we (and maybe another community) collaborate on creating a proper dataset, and the people with the beefier hardware are down to volunteer to finetune and/or quantize the models, then we can make our own "Qwen3.7-27B" at home.

Obviously it isn't that simple, there are a lot of things to think about here. Things like submission quality, consistency, etc are going to be hurdles to overcome in order to actually create a good, usable dataset. It'll definitely be a big challenge.

However, I think that given recent events, we should probably start thinking about doing something like this. If one day companies stop releasing open-weight models (which is an ever growing possibility nowadays), we would be in a much better place if we had more ways to continue to progress local LLMs ourselves, instead of being forced into a standstill.

If anyone has any ideas on how to do this, logistically or otherwise, please let me know. I think this is the kind of thing that can really benefit the community

submitted by /u/True_Tangerine_4706
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA