News / #edge Tag Edge 197 articles archived under #edge · RSS Sign in to follow r/LocalLLaMA community 12d ago Local models went from mostly useless to actually useful really fast. What changed? https://preview.redd.it/knc4ht7bft7h1.png?width=1048&format=png&auto=webp&s=49abdb8b0f358e799ecb06aa49134d9b0fd49336 Mitchell Hashimoto had a good point earlier: local models went from basically useless to actually useful in what feels like one year. I think thats pretty… 5 arXiv — Machine Learning research 13d ago AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor arXiv:2606.17872v1 Announce Type: new Abstract: Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since… 27 r/LocalLLaMA community 13d ago Hashicorp founder thinks local models "aren't good ENOUGH yet" Generally, respect him a lot, but this is a wrong take. More than 1 year ppl are doing alright using SLMs for coding; only vibecoders might struggle Link   submitted by   /u/Orbit652002 [link]   [comments] 24 NVIDIA Developer Blog official-blog 13d ago Build On-Device AI Companions with the NVIDIA ACE Game Agent SDK and Unreal Engine 5 Plugins NVIDIA RTX technologies are deeply integrated into Unreal Engine 5 through the NVIDIA RTX Branch of Unreal Engine and the NVIDIA DLSS Unreal Engine plugin. This... 23 Simon Willison community 13d ago Quoting Georgi Gerganov I can 100% attest to the fact that Qwen3.6-27B is a very capable local model for coding tasks. Over the last month and a half I've been using it almost daily, either on my M2 Ultra or on my RTX 5090 box. I use it for small mundane tasks at ggml-org - nothing really impressive,… 9 Hacker News — AI on Front Page community 13d ago Running local models is good now Article URL: https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/ Comments URL: https://news.ycombinator.com/item?id=48555993 Points: 299 # Comments: 159 18 r/LocalLLaMA community 13d ago Are small local models for automation a thing? I’ve been following this sub for a while, and it feels like the massive hype is always around having a local vibe coding assistant or trying to run heavy, near-frontier models locally, and that’s amazing. But I feel like we are overlooking a massive use case, for me, an… 5 r/LocalLLaMA community 14d ago I made a game where you convince an AI model that reality is a simulation. Progress update: Showed you all my demo last week, had some great conversations with some very smart folk, and spent days fixing bugs and trying things out. And now, I humbly present to you: Simulation Simulator! A chat simulator game that bundles a local LLM inside Unity, and… 5 Hacker News — AI on Front Page community 14d ago Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding? Has anyone here fully swapped Claude/GPT for a local model as their main coding tool, not just for side experiments? If so, please share your setup and performance (e.g tok/s) Comments URL: https://news.ycombinator.com/item?id=48542100 Points: 510 # Comments: 255 23 r/LocalLLaMA community 14d ago archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0) archex turns a repo into a ranked, token-budgeted context bundle for coding agents: the symbols, imports, dependency-graph neighbors, and provenance the model needs, assembled before it reasons. It returns context, not an answer — your local model still does the thinking. The… 24 arXiv — Machine Learning research 15d ago Efficient On-Device Diffusion LLM Inference with Mobile NPU arXiv:2606.13740v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) accelerate generation by denoising multiple tokens in parallel, making them attractive for latency-sensitive mobile inference. However, repeated denoising introduces substantial computation… 35 arXiv — Machine Learning research 15d ago Federated Learning for Feature Generalization with Convex Constraints arXiv:2606.14416v1 Announce Type: new Abstract: Federated learning (FL) often struggles with generalization due to heterogeneous client data. Local models are prone to overfitting their local data distributions, and even transferable features can be distorted during aggregation.… 12 r/LocalLLaMA community 15d ago Made a macOS app that creates highly personal macOS apps. Works with models as small as Gemma 4 E2B Apologies in advance as the video is demonstrating with GPT 5.4 mini (a local model would take too long for a video), however I’ve made the same app with Gemma 4 E4B. Been working on an open source project for a while called Ironsmith. The gist is you can create highly… 13 r/LocalLLaMA community 15d ago Help with resources for using LLMs as fictional characters Hey ya'll, I'm an ex-cognitive scientist turned NLP Data Scientist by day, and science fiction author by night. I want to bring fictional characters in my prose to life with Local LLMs, and I'm looking for the best resources out there for doing this kind of work (datasets,… 10 r/LocalLLaMA community 15d ago Local models in mid-2026 Open weights got close enough to run at home this year, not by needing more RAM but the reverse: sparse attention, MoE, latent KV compression, multi-token prediction and four-bit quant.   submitted by   /u/mattjcoles [link]   [comments] 11 r/LocalLLaMA community 15d ago Build for local LLM with 2 separate GPUs I want to build a headless compute machine to run a RTX Ada 4000 (20GB) with a RTX Pro 5000 (48GB) or RTX PRO 4500 (32GB) in parallel for inference. The goal is not running one large model using 2x GPUs, but rather running separate models on each GPU. Why these GPU config?… 19 r/LocalLLaMA community 16d ago I don’t know who needs to hear this but 128GB BD-R XL M-DISC is SOTA for consumer-available archival optical storage (for backing up your models) If you’re trying to download and preserve your local LLMs in case of future availability issues due to AI-related politics, your best bet is either 128gb or 100gb Blu-Ray optical disks, more specifically BD-R XL M-DISC standard format which are archival-grade and built to last… 21 r/LocalLLaMA community 16d ago In your opinion, what is the best CLI-based (or other) coding tool for regular software engineering (NOT VIBE CODING)? This includes but is not only limited to: OpenCode, Command Code, Kilo Code, Cline, Claude Code, etc. Please try to include tools in which I can connect local models, so not stuff like Antigravity.   submitted by   /u/Potential_Top_4669 [link]   [comments] 36 r/LocalLLaMA community 17d ago We should set up a torrent network for open source models. Was just thinking about this due to recent events. Hugging Face is a US-based company, legally incorporated as Hugging Face, Inc. with its official headquarters located in Brooklyn, New York. It seems like a pretty big single point of failure for local models. Maybe a… 23 r/LocalLLaMA community 17d ago Anthropic forced to abruptly disable Fable 5 & Mythos 5 globally by US Gov over a jailbreak. This is exactly why we need local models. I just saw this statement regarding Anthropic being hit with an emergency export control directive from the US government. They were forced to pull the plug on Fable 5 and Mythos 5 for all customers globally. The tl;dr is that the government got spooked by a narrow jailbreak… 10 r/LocalLLaMA community 17d ago Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand. When we first started experimenting with local LLMs, it was a completely different story! We were using gaming GPUs to tinker around. 8GB or 16GB of VRAM (which wasn't even a given for everyone) was the norm, and so many people could actually get their hands dirty and… 25 arXiv — NLP / Computation & Language research 18d ago sebis at CRF Filling 2026: A Two-Stage Local LLM Pipeline for Medical CRF Filling arXiv:2606.13082v1 Announce Type: new Abstract: The extraction of structured clinical information from unstructured EHR notes is a persistent bottleneck in healthcare informatics. While large language models (LLMs) offer high performance, their deployment in clinical settings is… 12 arXiv — NLP / Computation & Language research 18d ago TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum arXiv:2606.13267v1 Announce Type: cross Abstract: TimeLens is an AI-powered bilingual mobile guide for the Grand Egyptian Museum (GEM). Pointing a phone at an exhibit, a visitor sees the artifact recognized in real time and can ask follow-up questions answered in English or… 37 r/LocalLLaMA community 18d ago xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work If you're running local models on a Ryzen AI Max / Strix Halo box, you've probably noticed it's hard to see what the NPU is actuallydoing. amd-smi is still broken on gfx1151 (ROCm #6035 ( https://github.com/ROCm/ROCm/issues/6035 )), and while GNOME Resources has a GUI view, I… 21 arXiv — NLP / Computation & Language research 19d ago Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite arXiv:2606.11257v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) pipelines are compute-intensive, combining embedding, retrieval, reranking, and large language model (LLM) generation. Running them entirely on-device benefits privacy, latency, and offline use,… 35 r/LocalLLaMA community 19d ago I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3) I kept wanting to talk to my local models instead of typing, but every voice setup wanted a GPU, shipped my audio to the cloud, or was macOS-only. So I built one that's none of those — and I benchmarked it, so these are real measured numbers, not vibes. One command installs the… 12 r/LocalLLaMA community 19d ago Tried to benchmark Google’s new on-device dictation models (Eloquent) and basically couldn’t I tried to benchmark Google’s new on-device dictation app (Eloquent) and basically couldn’t. It drops about half of my dictations. tl;dr Full results are 👉 here . Background: Google shipped a new fully‑local dictation app yesterday with proprietary new models , so I was excited… 5 r/LocalLLaMA community 19d ago Local LLM good for OCR of handwriting? I am using qwen3-vl:8b and ollama for doing OCR on scans of handwritten letters and it is doing a decent job. Any other models I should know about for this kind of OCR?   submitted by   /u/SensitiveCranberry00 [link]   [comments] 15 r/LocalLLaMA community 19d ago Local LLms releases Here are some graphs for the Local LLMs releases, it's strange except for the last month, i thought that this year was very heavy in terms of release, but is seems that the peak was last year. Maybe the hype about the quality improvement this year made it seems that it was… 4 r/LocalLLaMA community 19d ago Can you really replace paid models with a local model? Long time lurker, and I say this as someone who genuinely loves this community and runs many local models myself. I’ve been using LLMs since the early GPT and LLaMA days. Obviously, models have come a unbelievably long way. Local/open models today are dramatically better than… 13 r/LocalLLaMA community 20d ago Anthropic is intentionally nerfing Fable when asked to develop other LLMs Reason 458 why local LLMs are going to be a necessity   submitted by   /u/onil_gova [link]   [comments] 16 arXiv — Machine Learning research 20d ago Operator Fusion for LLM Inference on the Tensix Architecture arXiv:2606.09879v1 Announce Type: new Abstract: This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix architecture and proposes an operator fusion strategy that enhances data locality. RMSNorm is fused with matrix multiplication in… 35 r/LocalLLaMA community 20d ago Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO): Run a 3-voice council on each question, produce a synthesis Cross-examine: losing voices challenge the synthesis If synthesis gets… 35 r/LocalLLaMA community 20d ago Furiosa AI selling inference chip to consumer market will be a game changer to local llm ​ This is south Korean start up all-in on inference chip: https://furiosa.ai/renegade-spec Tsmc 5nm node Hynix HBM3 1.5TB/s 48GB VRAM TDP 180W Already tested on LG LLM. If they opened their programming interface the way NVIDIA opens PTX and Intel opens SPIR-V, and team up… 12 r/LocalLLaMA community 20d ago Apple announced new on device inference engine for Apple Silicon This news seem to have flown under the radar. Apple announced CoreAI on WWDC which is basically a future replacement for CoreML and an alternative to MLX/llama.cpp/torch for on-device optimized inference, especially on phones and tablets. The model weights need to be converted… 25 r/MachineLearning community 20d ago Are privacy-preserving techniques actually being used in production ML systems? [D] I've been reading more about privacy-preserving ML approaches such as differential privacy, federated learning, and on-device inference. The research literature is fairly active, but I'm curious about real-world adoption. For those working in industry: Are these techniques being… 16 r/LocalLLaMA community 20d ago Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support Hey everyone, TinySearch v0.2.0 (first stable beta) is out. The first version used DuckDuckGo directly, which worked well enough to prove the idea, but yeah.. relying on one search source was way too fragile lol. DDG started throwing limits/CAPTCHAs more often in the last 2… 25 r/LocalLLaMA community 20d ago Gemma 4 31B's competence surprised me I'm just getting started using local LLMs for code. I'm not interested vibe coding, but I am hoping to increase my productivity in the publish or perish world of academia. My existing code from past projects is a mess and LLMs often fail to understand my code because I work with… 38 arXiv — Machine Learning research 21d ago HASA: Subnet Allocation for Compute-Constrained Model-Heterogeneous Federated Learning arXiv:2606.07621v1 Announce Type: new Abstract: Edge services increasingly use federated learning to personalize on-device models while keeping sensitive data local. In practice, deployments must handle heterogeneity in both client resources and local data distributions.… 24 r/LocalLLaMA community 21d ago LocalLLaMA post tier list Since there is much (justified) whining about post quality, I thought it would be helpful to get a sense of what people actually DO like. Here's my take: S-tier: -GGUFs/MLX or benchmark data for new best-in-class local model released - New Optimizations that are actually a big… 17 r/LocalLLaMA community 21d ago Friends from the localllama community, if you love local llm, don't participate in the IPO (spaceX, OpenAI, Anthropic) I'm not going to. And you shouldn't either. The frontier labs are the ones who are harming our community. They are jacking the hardware prices up. First it was nvidia GPUs. And then it was RAM. And then SSD. And now HDDs prices are x3 compared to last year. Even NAS prices are… 35 r/LocalLLaMA community 21d ago I bundled a fully local LLM inside my Unity game. No internet, no cloud, no API key. The conversation is the gameplay. I am making a game that is bundled with a local LLM and every conversation is unique. The game, 'Simulation Simulator', is a campfire chat sim game about DMT, simulation theory, and a friend with a computer monitor for a head. 5 endings you can reach totally based on how you… 27 r/LocalLLaMA community 22d ago Galaxy Z Fold6 as a local inference node — llama.cpp/Vulkan, homelab telemetry, SHA-256 model verification Built a small Android app called Pocket Node that runs llama.cpp inference on-device. Here's what it actually does and what it doesn't. **What it does** * Loads a GGUF model (SmolLM3 Q4_0, ~1.1B params) directly on the Fold6 * Uses the Vulkan/OpenCL backend via llama.cpp — not… 12 r/LocalLLaMA community 22d ago Qwen3.6 35B-A3B on a Laptop: My Zero to One Moment Hi everyone, I'm new here - because I only have a laptop and I only just realized local models are actually good enough now. So I'd like to share my experience, in case it helps others, and also to learn from the more experienced people here. This is the first model that works… 29 r/LocalLLaMA community 23d ago Are local models good enough to replace Claude/Codex solely for simple HTML tasks? I know local models can’t compete fully yet, but I’m curious about where the limits are. My use case is generating simple HTML activities for elearning creation purposes. I know others are creating apps and more advanced software. Where are the limits for where local models can… 20 r/LocalLLaMA community 23d ago RTX 3090 EBay Pricing is Crazy!! Couple of years ago, before Local LLMs were in vogue, I bought 8 RTX 3090 @ $700 each to build a AI rig, it been working great and I was looking to build another to increase my capacity but looking at EBay those are now selling for 1,300 -1,500 range! That price seems totally… 17 r/LocalLLaMA community 23d ago Best Coding Harness for Qwen3.6 35B? I've been happily using GitHub Copilot for 7-8 months, primarily in Visual Studio and VS Code, mostly with the built-in flagship models and have felt like the output is worth the cost. Lately I've been playing with a lot of different local LLM models and decided to try using… 32 r/LocalLLaMA community 23d ago Experimentation with Qwen 3.6 and Gemma 4 - Guidance needed I’m a web developer doing mostly coding, but also project management, requirements analysis, testing, etc. I recently started experimenting with local LLMs, mostly because agentic stuff finally made them feel useful. Note: This text was fed to chartgpt to fix my messy repeating… 32 r/LocalLLaMA community 23d ago AA comparison of the latest local models I picked models I consider local (usable on 3×3090), so there are no 300B models, and you should probably skip 200B models too (but MiniMax and Step are pretty fast in Q3) Gemma-4 12B is still missing   submitted by   /u/jacek2023 [link]   [comments] 15 r/LocalLLaMA community 24d ago OpenLumara - A different kind of AI agent, written from scratch, not vibecoded. Extremely token-efficient, super small system prompt, made for local models. Everything is modular. Hi locallama community! Yes, I know, yet another AI agent announcement post. There are a dime a dozen out there... most of them though, are vibecoded, often very sloppy, and eat through context like no tomorrow. This is different. This runs beautifully and very fast with local… 9 Page 2 of 4 · 197 articles ← Newer Older →