Do not fall into the trap of chasing the next scale or upgrade.
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I mean; don't get me wrong, I love me some improvements and enhancements and it keeps on giving... and with MTP making its way to llama.cpp soon, a lot of you who aren't already running custom compiles are about to get a boost in inference speed, and your workflows will feel that extra POWER when running locally. That is insane... but don’t fall for the trap.
Productivity is being measured by large context sizes and token consumption, but models in their current form can already do so much even on 6GB and 12GB GPUs. The reason I say don’t fall for the trap is because I was generating content faster than I could do anything useful with it. What good is quantity without quality? sometimes I I feel the need to slow down and be more intentional about what I process, I prioritized compute expansion over deliberateness which is more impactful when it comes to direction.
I remember someone say "LLMs are mismanaged geniuses" and it clicked.
For example, I used to FOMO over my unused Claude max quota: “I have access to this beefy power; why don’t I use it? lemme just throw a bunch of busy work at it for the sake of being busy”... but that’s like over-consuming coffee just so you can procrastinate faster lol.
I ended up generating lots of trading strategies faster than I could validate them in live markets. Local models are already good enough; they just need quality feedback loops with real results, real-market feedback, or even simulated backtest results, so that they can give you higher-quality guidance with more contextual awareness of how their prior outputs are performing. My Qwen3.6-35B-A3B-UD-Q3_K_XL is doing the lord’s work with only a 64k context on my RTX 3060 12GB, finding profitable trading edges and then feeding back the parameters that worked so that it can explore nearby or adjacent pathways between what works and what doesn’t.
We’re there, fam. This is it.
[link] [comments]
More from r/LocalLLaMA
-
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)
May 13
-
Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options?
May 13
-
Side Projects.
May 13
-
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.