club-5060ti: practical RTX 5060 Ti local LLM notes and configs
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I put together a small public repo for RTX 5060 Ti 16GB local LLM setups: I took inspiration from the club-3090 repo, but this one is focused on documenting what we’ve actually tested on 5060 Ti hardware so the setup details are easier to share and reproduce. Current seed setup is 2x RTX 5060 Ti 16GB on Linux, with notes for: - vLLM serving Qwen3.6 27B NVFP4/MTP - llama.cpp MTP GGUF serving for Qwen3.6 27B Q4/Q6 - Q6 long-context fit checks, including a 204800 direct long-context preset - a safer 65536 llama.cpp router preset for extra headroom - initial Qwen3.6 35B A3B checks on llama.cpp and vLLM - sanitized launch examples - model download and llama.cpp update helper scripts - simple OpenAI-compatible smoke/bench scripts - CSV seed results and report templates The aim is to keep it practical: exact configs, versions, context lengths, KV settings, and caveats rather than vague tokens/sec claims. If anyone else is testing similar 5060 Ti setups, feel free to open an issue or PR with enough detail to reproduce the result. [link] [comments] |
More from r/LocalLLaMA
-
Why Dario is on fire: lesson from dotcom bubble.
Jun 30
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.