r/LocalLLaMA · July 1, 2026 · 2 min read

LokalBot - fully local macOS app: meetings, autocomplete, and day tracking that all run on your machine with a user friendly UI

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Been lurking here a while, this sub is basically why LokalBot exists. It's a Mac app that records + summarizes your meetings, autocompletes your typing in any app, and tracks where your day went, with every model running on-device. No cloud, no account, no API keys.

Most of the workflows LokalBot has I've been using multiple separate apps to do like Granola, Cotypist etc. but now I have a single app that is doing all those with no additional 3rd party inference cost.

Heads up first: Apple Silicon / macOS 15+ only. It's welded to the Neural Engine, MLX, and Core Audio, so no Linux/NVIDIA.

I'm running it on a MacBook M4 Max with 48GB of RAM, and it's running well with some spikes so if you have 16-24GB RAM my model defaults are probably not going to work for you as seamlessly but there are some good alternatives in the models settings in the app.

The model stack:

Summaries, chat, and cotyping run on a bundled llama.cpp — in-process libllama for cotyping's low latency, llama-server otherwise. Point any of them at your own GGUF, an Ollama or OpenAI-compatible endpoint, or Apple Intelligence.
Transcription: Granite Speech 4.1 / Parakeet / Whisper / Qwen3-ASR via CoreML/MLX on the Neural Engine. Parakeet clocks ~190× realtime.
Semantic search: Qwen3-Embedding 0.6B GGUF on a second llama-server (--embeddings), vectors in SQLite, brute-force cosine. At personal scale "brute force" is just "instant," and it adds zero dependencies.
Diarization: optional pyannote (via FluidAudio) to split "Them" into Them 1 / Them 2.
In-app Hugging Face browser to search + download GGUFs, with a per-model hardware-fit advisory.

My current defaults I found best in real usage(very open to being told I'm wrong):

Transcription: IBM Granite Speech 4.1 (2B) Q4
Summarization: Qwen 3.6 35B-A3B Q4_K_M
Cotyping: Gemma 4 E4B Q5 XL

Privacy is the whole point. The only network call is the one-time model download; after that it's fully offline. Point Little Snitch at it during a meeting and enjoy the flattest network graph you've ever seen. Optional screenshots are AES-GCM sealed and auto-delete.

GitHub : https://github.com/stevyhacker/lokalbot
Landing : https://lokalbot.com

Mostly I'd love this crowd's take on the model picks — especially better local ASR and small, fast cotyping models. What would you run?

submitted by /u/stevyhacker
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA