llama-launcher v1.3 release -> Bayesian Optimisation
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Hello everyone, some of you may have seen a post of mine from a few days ago about my app, llama-launcher, a lightweight point-and-click GUI to create llama-server commands without the constant need for typing them up. Well, I've just added an optimisation feature that uses Tree-Structured Parzen estimation through optuna's framework. It uses llama-server to tune a pre-determined set of parameters to try to squeeze the last bit of juice out of your system, completely hands-free. I've been using this to get the last bit of performance from my MTP models without having to sit at my desk tuning, loading, prompting, and unloading manually and repeatedly. So far, I've seen upto a 15% improvement in speeds (as seen in the images) versus baseline commands with no tuning with Gemma 12B MTP during testing. Without any human interaction at all during the optimisation process. It's still in it's early stages so there are many improvements to be made but any suggestions you may have please let me know. You can check the repo out here: https://github.com/SolaryKryptic/llama-launcher [link] [comments] |
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.