Get you some GPUs, it's not worth the hacks around lack of RAM
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| If you can, get you some GPUs, all the hacks around limited vram is not worth the pain and effort. Even if it means getting P40s or MI50s. Get you enough GPU to have everything in memory. Qwen3.6-27B. 27B the dense model. Q8, f16 K/V cache, 128k context on 2 used 3090s. 1399 pp, 104 tg [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.