r/LocalLLaMA · · 2 min read

I made a Windows app for managing llama.cpp in WSL/Ubuntu

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I made a Windows app for managing llama.cpp in WSL/Ubuntu

I’m a Windows user, and I have fairly Windows-y expectations for software: I prefer not having to live in a terminal just to install, build, configure, and run things.

I couldn’t find an app that managed the full llama.cpp-on-WSL workflow the way I wanted, so I made one.

llama.cpp Console is an unofficial Windows desktop app for setting up and running llama.cpp models through Ubuntu/WSL. The Windows app itself is a self-contained WPF app, and it helps manage the WSL side from the UI.

GitHub:

https://github.com/alekk89/llama.cpp-Console

What it can do from the UI:

- Detect/install WSL and guide Ubuntu setup

- Install/update CPU build tools inside Ubuntu

- Install/update CUDA Toolkit support inside WSL

- Install/update Vulkan build dependencies

- Download llama.cpp source from the official repo or a custom repo

- Build CPU, CUDA, or Vulkan llama.cpp runtimes inside WSL

- Search Hugging Face for GGUF models

- Download/register models, including some compatibility hints and companion projector/mmproj handling

- Set launch parameters per model

- Choose which llama.cpp runtime/build each model should use

- Start, stop, and supervise llama-server

- Monitor live tokens, runtime metrics, logs, GPU status, utilization, and temperatures

- Track logs, jobs, downloads, and lifetime metrics

- Manage local OpenCode model/provider/agent config snippets from the app, so a configured model can be added to OpenCode quickly

The main reason I built it is that I wanted the boring setup work to feel more like normal Windows software - click through the UI, see what is installed, see what is missing, build the runtime, download a model, pick launch settings, and run it without losing full control of what's going on.

A few notes:

- This is a Windows-first app. The actual llama.cpp runtime runs in Ubuntu/WSL.

- Model serving defaults to local-only.

- Right now the app is centered around one active served model at a time.

- The first public release is unsigned, so Windows SmartScreen may warn. SHA-256 files are included with the release artifacts.

- This is not affiliated with or endorsed by llama.cpp or ggml-org.

I’ve been using a simpler version of this locally for a while, then polished it up enough to release in case it’s useful to other Windows users. Planned future work includes faster model switching, keeping models warm in RAM where practical, and eventually supporting more than one loaded model at a time.

Please note that I do not own AMD GPUs, so the Vulkan installation/build path has not been validated on AMD hardware by me.

submitted by /u/wgaca2
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA