Tag

Edge

197 articles archived under #edge · RSS

arXiv — NLP / Computation & Language research 30m ago

MAM-AI: An On-Device Medical Retrieval-Augmented Generation System for Nurses and Midwives in Zanzibar

arXiv:2606.29580v1 Announce Type: new Abstract: Maternal and newborn mortality remain among the highest in sub-Saharan Africa, where midwifery care is often delivered by nurses who lack midwifery training to international standards, and consulting authoritative guidance at the…

7
r/LocalLLaMA community 5h ago

I Hate Dario Amodei, and everything he stands for.

I am so incredibly sick of this guy‘s fear mongering about open source while fundamentally misunderstanding how it actually works. He recently dropped some arguments that are so completely detached from reality, it honestly feels like he’s never even touched a local model in his…

31
r/LocalLLaMA community 14h ago

Anyone else end up building a web access layer for local AI agents?

I've been running local models for most of my experiments, and I kept running into the same issue. The model lives locally, but everything it needs to interact with doesn't. Every new agent ended up with another GitHub client, another Reddit integration, another documentation…

10
r/LocalLLaMA community 14h ago

NASA testing local LLM inference for future space missions

Red Hat published a blog post last week about an initiative I supported with NASA researchers at Johnson Space Center building a medical AI assistant. It's called the Crew Medical Officer Digital Assistant (CMO-DA) and the system runs LLMs and other models on local hardware with…

34
r/LocalLLaMA community 1d ago

I built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.

This is something I've been working on, I like playing around with smaller local models but found most agent harness's not well suited for them. The failure modes across different model family's tend to be the same: Failed tool calls Poor varication of environment variables Poor…

12
r/LocalLLaMA community 1d ago

NPC Engine Using Local Models

I’ve been working on a game-agnostic NPC engine/backend based pretty heavily on SillyTavern-style architecture, and with smaller local models getting better and better, I honestly think this kind of thing could be the future of RPGs. Right now I’m using NVIDIA Parakeet 0.6 for…

22
r/LocalLLaMA community 1d ago

Best case for dual RTX 3090 (250W each) on Crosshair VIII Hero?

I'm building a local LLM workstation and would appreciate some advice from people already running 2×3090s. Current hardware: ASUS Crosshair VIII Hero (X570) One Gainward Phoenix RTX 3090 Looking for a second used 3090 (not necessarily the same model) Both GPUs will be…

9
r/LocalLLaMA community 2d ago

I built a tool to turn your Claude Code sessions into fine-tuning data for local models

If you use Claude Code, every session is already sitting on disk as a .jsonl file under ~/.claude/projects/ . It has real coding conversations: multi-turn edits, tool calls, reasoning traces. That's training data you already generated for free. The problem is the format is not…

36
r/LocalLLaMA community 2d ago

Mythos was the first, now GPT-5.6

https://techcrunch.com/2026/06/26/openai-limits-gpt-5-6-rollout-after-government-request-says-restrictions-shouldnt-be-the-norm/ Either a hype before IPO, or they have just shot themselves in a foot. This is pretty much it for more advanced online models. Local LLM is one of the…

17
r/LocalLLaMA community 2d ago

What’s the latest on agent browser use?

What is the latest and greatest agent browser use framework? I remember trying browser use a few months back and it was ok but would fall apart after long workflows. Has there been improvements to agents controlling browsers and following a predefined workflow? Can local models…

32
r/LocalLLaMA community 2d ago

Dear poor people of this subreddit

I see people with multi-gpu setups but I'm sure there's a potato LLM runner out there somewhere. I have an old macbook pro (i5 8th gen, 8GB RAM) that I want to turn into a homelab. I want to run a small local model for experimenting and if possible, agentic tasks (like say…

22
r/LocalLLaMA community 3d ago

Local LLM Peeps

I am 80% done with a harness that works for local and API but is local first. The harness has some interesting logic around multiple agents which I’m holding back on until it is open source on GitHub. I have been local for 6 months and built out EVERYTHING I could think of to…

28
r/LocalLLaMA community 3d ago

Streaming medical STT running locally on a MacBook

Quick teaser of what I’ve been working on over the last few weeks: a streaming medical speech-to-text model that runs fully on-device. This demo is running locally on a MacBook through MLX. Still doing more evals, but planning to release the open weights next week.  …

22
r/LocalLLaMA community 3d ago

Getting real work out of a 4B local model: the distill-on-idle pipeline behind an on-device "memory" assistant

https://preview.redd.it/iiiqwt96tn9h1.png?width=3004&format=png&auto=webp&s=f02fba9f64e27ac91b2ae4cd478842106b294366 https://preview.redd.it/47cb5u96tn9h1.png?width=3024&format=png&auto=webp&s=b1cee93477970b8b0a636c37be657fecd38ba968…

7
r/LocalLLaMA community 3d ago

What's one local AI workflow you wish you'd discovered sooner?

There are a lot of posts about the models and benchmarks, but I am more interested in the workflows that people use. What is one workflow that really saved you time or made your local LLM more useful? It could be anything—RAG, MCP, coding agents, organizing prompt, document…

23
r/LocalLLaMA community 3d ago

Help optimizing llama.cpp + Qwen 27B on RTX PRO 6000 Blackwell for coding agents

Our company recently acquired a workstation with an RTX PRO 6000 Blackwell , and we're experimenting with local LLMs to reduce part of our Claude token usage. Right now we’re running Qwen3.6 27B MTP Q8_K_XL with llama.cpp on Windows 11 . I've been using both Claude Opus and…

13
arXiv — Machine Learning research 4d ago

Dot-Flik: A Scalable Edge AI Architecture for Distributed Insect Monitoring

arXiv:2606.26121v1 Announce Type: cross Abstract: Global insect population declines necessitate scalable, continuous monitoring systems, yet existing vision-based solutions remain constrained by high hardware costs, energy demands, and reliance on centralized processing or cloud…

11
arXiv — NLP / Computation & Language research 4d ago

AnySimLite: A Lightweight Few-Shot Similarity Encoder for On-Device Speech-Adjacent Classification

arXiv:2606.26452v1 Announce Type: new Abstract: To minimize privacy concerns and inference latency on edge devices like smartphones, lightweight on-device models remain important for end-user applications. Many of these applications involve natural language classification, but…

31
arXiv — NLP / Computation & Language research 4d ago

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

arXiv:2606.26861v1 Announce Type: new Abstract: Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance…

27
r/LocalLLaMA community 4d ago

Good YouTube channels for local LLM news and development?

Sometimes I'd prefer chilling on the couch and learning instead of reading. I've searched on YouTube and most seem like clickbait and slop. Thanks   submitted by   /u/6jarjar6 [link]   [comments]

5
r/LocalLLaMA community 4d ago

Built an open source local first Kanban workflow for running AI coding agents without babysitting every step

I’ve been building BatonBot, a local first app for running AI coding workflows with less babysitting. The problem I kept running into, especially with local models, is that coding agents can be useful but the workflow gets slow: start task → wait → check output → fix next issue…

10
r/LocalLLaMA community 4d ago

Prices of graphic cards are going crazy, should I buy a second card though?

A few months ago, I bought a RX 7900 XTX 24g to start toying with local LLM, at 900€ new. Little I knew that now I want to add a second card to my rig, but prices have gone insane! Adding a new 7900 XTX would cost me 1200€ as new now, used price is around 900€ now, and the last…

38
r/LocalLLaMA community 4d ago

Fast medical RAG API to give your local LLMs access to facts

I created a simple RAG API using medical Wikipedia articles that you can point your agent to and use freely. It may be useful in allowing your local LLMs access to medical facts they might not be able to recall from their weights. I'm aiming for subsecond responses but cannot…

7
r/LocalLLaMA community 4d ago

It turns out Bash is All You Need to write a language model REPL (and jq and curl)

While working on an self-educational exercise tinkering with local models and trying my hand at setting up agents, I went down a rabbit hole: to see how far I could build a custom agent REPL loop using exclusively command-line building blocks and stripping out dependencies…

20
r/LocalLLaMA community 4d ago

Has anyone tried to hack into their own system using a local model?

With all this talk about Mythos being able to hack into. US government systems, I was wondering if anyone has tried to get root on their own system using a local model?   submitted by   /u/MrMrsPotts [link]   [comments]

18
arXiv — Machine Learning research 5d ago

On-Device Neural Architecture Search

arXiv:2606.24900v1 Announce Type: new Abstract: This paper proposes a new approach to near-sensor computing, in which a lightweight Neural Architecture Search (NAS) is performed directly on the deployment device to find the best tiny neural architecture for analyzing the…

26
arXiv — Machine Learning research 5d ago

Forget to Improve: On-Device LLM-Agent Continual Learning via Budget-Curated Memory

arXiv:2606.25115v1 Announce Type: new Abstract: On-device language-model agents improve by accumulating experience in retrieved memory rather than by updating weights. This memory is hard-bounded and exposed: it consumes RAM and energy, reaches peers through a thin uplink, and…

24
r/LocalLLaMA community 5d ago

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.

G'day. This is part 3 on my Local LLM adventures. I have a crazy system hacked server-to-desktop system : Component Spec GPUs 2x Hopper H100, 96 GB HBM3 each CPUs 2x Grace, 72 cores each Host memory 480 GB LPDDR5X per Grace, 960 GB total So I can run technically run GLM5.2.…

34
arXiv — Machine Learning research 6d ago

Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment

arXiv:2606.24173v1 Announce Type: new Abstract: On-device fault detection enables real-time diagnostics without cloud dependency, but deploying machine learning models on resource-constrained hardware demands careful tradeoffs between accuracy, latency, and model size. We…

14
arXiv — Machine Learning research 6d ago

EnerInfer: Energy-Aware On-Device LLM Inference

arXiv:2606.23001v1 Announce Type: cross Abstract: On-device LLM inference is increasingly attractive for privacy-preserving, reliable, and cost-effective deployment, yet its energy and thermal costs remain a critical bottleneck. Existing systems primarily optimize for decoding…

13
r/LocalLLaMA community 6d ago

650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.

Disclosure first: I maintain OpenMed, so read this with that bias. I'm posting the numbers with the full methodology and a runnable script so you can reproduce or tear it apart. I'm here for the next couple of hours to answer methodology questions. What it is: an open-source…

25
r/LocalLLaMA community 6d ago

My local server idling 99% of the time!

Guys what you running to make agents busy? Like some crazy 24/7 tasks, or maybe some useful ideas on how to utilize local llm with some purpose/use? I personally running Qwen3.6-27B with owu and with pi for coding (little-coder) but as in title - it’s idling all the time…  …

33
r/LocalLLaMA community 7d ago

been tracking EU DDR5 data for 25 days: Prices are dropping, and the DE vs. NL gap is wild (good news for local LLM builders in EU)

hey again! been tracking DDR5 prices across 4 EU countries (DE, NL, ES, BE) for the past month. some findings relevant to local LLM builders: prices are falling: G.Skill DDR5 Aegis 2x16GB 6000: -28% in 25 days (€579 → €419) Kingston FURY Beast RGB 2x16GB 6000: -26% (€499 → €369)…

37
r/LocalLLaMA community 7d ago

Do you think dedicated hardware for running local LLMs will become affordable anytime soon?

Models like qwen 27b dense have already proved to be useful coding/general purpose assistants, but issue is still with hardware even the entry level hardware is relatively expensive, would we be getting hardware specifically built for inference for consumers at affordable price…

6
r/LocalLLaMA community 8d ago

For programmers with slow local LLM setup, what's your workflow?

What's your workflow and what's the best way you have found to code with local LLM when your token generation is < 10 tk/sec?   submitted by   /u/segmond [link]   [comments]

14
Hugging Face official-blog 8d ago

We got local models to triage the OpenClaw repo for FREE!*

Back to Articles a]:hidden"> We got local models to triage the OpenClaw repo for FREE!* Published June 22, 2026 Update on GitHub Upvote - Onur Solmaz osolmaz ben burtenshaw burtenshaw shaun smith evalstate Pedro Cuenca pcuenq Lysandre lysandre *Free as in beer, excluding the…

30
r/LocalLLaMA community 8d ago

Local LLM Inference Optimization: The Complete Guide

I compiled a year of local LLM experiments into a practical llama.cpp optimization guide, covering VRAM fitting, KV cache, MoE placement, MTP, CPU tuning, and common OOM traps. Pass this to an LLM of your choice and get on the local model train.…

4
r/LocalLLaMA community 8d ago

Local text to image model comparaison: The ultimate test.

I selected 192 prompts to evaluate text-to-image model various capabilities and generated images for all the local models I was able to make work on my GX10 Spark. For instance: Is the model good at text? At faces? At human anatomy? At respecting spatial composition, etc...? You…

4
r/LocalLLaMA community 8d ago

Best local model for vision - 2nd benchmark update - 21 Jun 2026

I previously posted the first results of my VLM benchmark . There were a few useful comments and observations I took into account, to revise and expand my benchmark: I initially did not take into account the Gemma 4 vision budget which defaults to 280, essentially making it…

9
r/LocalLLaMA community 8d ago

Watch local LLMs escape the rooms you design

Hello! I'd like to share my repo for WATCH MY ESCAPE: https://github.com/cjami/watch-my-escape It's an inverted escape room game where you design the maps and LLMs have to try to escape them. It uses traditional action verbs (e.g. push, pull, pick-up) to interact with the…

34
r/LocalLLaMA community 9d ago

What are people doing with their local models and what tools do you use them with?

I am trying to come up with some more uses for my DGX Sparks. Curious which tools work best for things like coding as well. What do you use instead of things like the claude.ai web interface? I have played with OpenWebUI but it just doesn't seem as capable without a lot of…

31
r/LocalLLaMA community 9d ago

It’s time to decentralize model distribution! Introducing Noema Atlas

TL;DR: Noema Atlas is a peer-to-peer network software using Iroh for local LLM weights, free and open source (Apache-2.0). Models come from whichever peers have them, with Hugging Face and mirrors as fallback (opt-in). Every file is identified by its content hash and a signed…

38
r/LocalLLaMA community 9d ago

You can now convert EXL3 quants on Apple Silicon Mac

Hi, I'm here with an update. But this time it's quite a bigger news on local llm. Normally accessing the high fidelity quant like EXL3 is CUDA gated, and imagine you need 96GB-128GB with RTX cards, they are very specialized and expensive. But now on a more general basis, MacOS…

38
r/LocalLLaMA community 9d ago

Best local LLM for English story summarization

Hello, which local LLM is currently the best at story summarization? The stories can be multiple pages long and are in English. Thanks!   submitted by   /u/DesperateGame [link]   [comments]

24
r/LocalLLaMA community 10d ago

Improving local models with an API based "consultant"?

I'm sure that someone else has come up with this before, but i just wanted to ask: Has it occurred to anyone to improve their local AI workflow by adding a more powerful API based "consultant" agent (GLM 5.2 now springs to mind) to call upon for refining plans, learnings and…

35
r/LocalLLaMA community 10d ago

Is my CPU and RAM too weak/ lees for local LLMs? Both are going 100% for simple test prompts. GPU is not getting used fully. In theory quen3.5:9b should fit and run on RTX3050 8 GB comfortably.

https://preview.redd.it/i69vee9mi88h1.png?width=1592&format=png&auto=webp&s=820720e8a3e1d5386d49119a235e2902acc13265 I am very new to this local llm world. Just started to exploring from past 3days. Share any troubleshooting tips.   submitted by   /u/mr_whoisGAMER [link]…

12
arXiv — Machine Learning research 11d ago

Low-Energy Reduced RISC-V Instruction Subset Processor for Tsetlin Machine Inference at the Edge

arXiv:2606.19964v1 Announce Type: new Abstract: Tsetlin Machine (TM) is a logic-based machine learning approach that relies on simple bitwise operations and finite-state automata, which makes it attractive for edge AI deployments. Recent work has focused on co-processor and…

23
r/LocalLLaMA community 11d ago

gave my local llm agent mcp tools for local image + video gen, so it just generates when i ask (fully offline+free)

free and open source, runs fully offline. the local llm agent does the image and video gen itself via mcp tools. details and github in the comments.   submitted by   /u/GroundbreakingMall54 [link]   [comments]

33
r/LocalLLaMA community 12d ago

Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools

v10.8 is out, so here's a project update on what landed. This was a 20-contributor release in just 7 days! Smarter memory and context management Dynamic VRAM management now auto-unloads idle models and downsizes their KV-cache to reclaim GPU memory on the fly, plus model pinning…

27
r/LocalLLaMA community 12d ago

I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects

In this game, NPCs, locations, items, quests, and other elements are generated not as one-off text, but as persistent in-game objects. The LLM handles dialogue, narration, situational interpretation, quest progression, and similar parts of the experience. Meanwhile, the game…

19

MAM-AI: An On-Device Medical Retrieval-Augmented Generation System for Nurses and Midwives in Zanzibar

I Hate Dario Amodei, and everything he stands for.

Anyone else end up building a web access layer for local AI agents?

NASA testing local LLM inference for future space missions

I built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.

NPC Engine Using Local Models

Best case for dual RTX 3090 (250W each) on Crosshair VIII Hero?

I built a tool to turn your Claude Code sessions into fine-tuning data for local models

Mythos was the first, now GPT-5.6

What’s the latest on agent browser use?

Dear poor people of this subreddit

Local LLM Peeps

Streaming medical STT running locally on a MacBook

Getting real work out of a 4B local model: the distill-on-idle pipeline behind an on-device "memory" assistant

What's one local AI workflow you wish you'd discovered sooner?

Help optimizing llama.cpp + Qwen 27B on RTX PRO 6000 Blackwell for coding agents

Dot-Flik: A Scalable Edge AI Architecture for Distributed Insect Monitoring

AnySimLite: A Lightweight Few-Shot Similarity Encoder for On-Device Speech-Adjacent Classification

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

Good YouTube channels for local LLM news and development?

Built an open source local first Kanban workflow for running AI coding agents without babysitting every step

Prices of graphic cards are going crazy, should I buy a second card though?

Fast medical RAG API to give your local LLMs access to facts

It turns out Bash is All You Need to write a language model REPL (and jq and curl)

Has anyone tried to hack into their own system using a local model?

On-Device Neural Architecture Search

Forget to Improve: On-Device LLM-Agent Continual Learning via Budget-Curated Memory

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.

Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment

EnerInfer: Energy-Aware On-Device LLM Inference

650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.

My local server idling 99% of the time!

been tracking EU DDR5 data for 25 days: Prices are dropping, and the DE vs. NL gap is wild (good news for local LLM builders in EU)

Do you think dedicated hardware for running local LLMs will become affordable anytime soon?

For programmers with slow local LLM setup, what's your workflow?

We got local models to triage the OpenClaw repo for FREE!*

Local LLM Inference Optimization: The Complete Guide

Local text to image model comparaison: The ultimate test.

Best local model for vision - 2nd benchmark update - 21 Jun 2026

Watch local LLMs escape the rooms you design

What are people doing with their local models and what tools do you use them with?

It’s time to decentralize model distribution! Introducing Noema Atlas

You can now convert EXL3 quants on Apple Silicon Mac

Best local LLM for English story summarization

Improving local models with an API based "consultant"?

Is my CPU and RAM too weak/ lees for local LLMs? Both are going 100% for simple test prompts. GPU is not getting used fully. In theory quen3.5:9b should fit and run on RTX3050 8 GB comfortably.

Low-Energy Reduced RISC-V Instruction Subset Processor for Tsetlin Machine Inference at the Edge

gave my local llm agent mcp tools for local image + video gen, so it just generates when i ask (fully offline+free)

Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools

I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects