I'm brand new to running LLMs and the sheer number of tools is overwhelming
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hey everyone. I'm brand new to running LLMs in general, even more new to running them locally, and the sheer number of tools available is absolutely overwhelming.
Regarding applications, I look at github and see so many different options that I don't know what to pick. Can't really fully decipher the differences between the tools either, mostly because their descriptions/taglines are filled with so many AI buzzwords. What's the go-to GUI for Windows? The built-in ollama GUI seems like it's pretty barebones.
Regarding model differences like between qwen vs gemma, is there a resource that shows a comprehensive benchmark?
I currently have ollama installed on Windows, downloaded gemma4 and qwen3.6 with
ollama pull gemma4 ollama pull qwen3.6 I don't understand the small differences between models, for example qwen3.6:27b vs qwen3.6:35b. I see the size is 17GB vs 24GB, but does one run faster than the other? If the entire model fits within VRAM, should I always use the larger one? How will I know if a model is too big or will run super slow? Purely based on the size listed on https://ollama.com/library/?
I also found this post: https://old.reddit.com/r/LocalLLaMA/comments/1snxzqi/its_just_me_or_qwen36_feels_kinda_dumb_or_its/
how do i decipher the differences between the 3 models tested? I see lots of letters and numbers that don't mean much to me
- gemma4-26B-A4B-it-UD-Q4_K_M
- gemma4-31B-it-Q4_K_M
- qwen3.6-35B-A3B-UD-IQ4_XS
My specs:
| Component | Item |
|---|---|
| CPU | 9950X3D |
| RAM | 64GB DDR5 @ 6000MT/s |
| GPU | RTX 5090 |
I'm open to any and all tips you're willing to provide. TIA!
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.