Got local Qwen 3.5/3.6 generating meeting summaries entirely offline on an M4 Max. Demo with Wi-Fi off. This is the future.
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I'm the founder behind Hedy, an AI meeting app. I'm a huge supporter of Local AI, and we've been working on making it "consumer friendly". Speech recognition in Hedy has always run on-device (whisper.cpp and now also parakeet). What just shipped is that the rest of the AI pipeline (summaries, detailed notes, chat with the meeting, live coaching) can now run on-device too using llama.cpp. Wi-Fi off, nothing leaves the laptop. Video above shows the full flow. A few technical specifics: Models supported out of the box. Qwen 3.6, Qwen 3.5, and Gemma 4 families. Range goes from 2B at the low end (works on newer iPhones) through 9B Qwen 3.5 as the sweet spot for most laptops, up to the newest Qwen 3.6 at 27B and 35B for users with more VRAM. Multiple quantization levels per model. On the 9B Qwen for example, you can pick between Q4 and Q8 depending on memory headroom. Bring your own model. You can download any compatible GGUF model from Hugging Face and load it into Hedy. Not restricted to the curated list. This was a deliberate call. The local AI space moves fast and we don't want users stuck waiting for us to update the bundled options. Acceleration. Metal on Apple Silicon, Vulkan on Windows GPUs, CPU fallback when needed. Mac unified memory means total system RAM is the constraint. Windows is VRAM-bound and the picker tells you when layers will spill to CPU. The app surfaces fit. Before you download a model, the picker tells you whether it'll be a great fit, a tight fit, or won't fit your hardware. It also shows current memory footprint so you know what headroom you have. No silent OOMs. Honest tradeoffs:
On the demo specifically: That was an M4 Max running Qwen 3.5 4B (needed to prioritize speed for the demo). The summary in the video took about 15 seconds for a ~10-minute meeting transcript. Your mileage will vary by model size and hardware. Happy to answer questions about model selection, the BYO setup, integration challenges, or anything else technical. Staying in the thread for a few hours. [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.