Llama.cpp VS LiteRT on a custom Xiaomi 12 Pro 24/7 Server (V2 Redesign)
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Thanks everyone for the advice on my previous post (24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4). You really inspired me, and I completely redesigned the cooling and power supply for this setup. What's new:
Here is how it looks now: https://reddit.com/link/1tlgxms/video/ul2iivua3w2h1/player https://reddit.com/link/1tlgxms/video/xiuyt9wk3w2h1/player Benchmarks (gemma-4-E4B):
https://reddit.com/link/1tlgxms/video/v0t8t5n54w2h1/player
https://reddit.com/link/1tlgxms/video/1cbz7rk85w2h1/player
GPU Struggles I tried running LiteRT on the GPU, but unfortunately, Google AI Edge hasn't released an APK for my Snapdragon 8 Gen 1. Swapping library files from the Qualcomm site didn't work either. I also tried running a Vulkan build of llama.cpp but ran into issues. I'll post updated benchmarks once I manage to get it working. Conclusion If anyone asks if it was worth it: If you have a powerful spare phone lying around and want a great DIY project, definitely yes. But if you just need an LLM server and don't want the hassle, you're better off just buying a Mini PC. Thanks again to this sub for the inspiration—I wouldn't have committed to such a massive rebuild without your feedback! [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.