Tag

Edge

197 articles archived under #edge · RSS

TechCrunch — AI news-outlet 1mo ago

Stability AI releases a new audio model that can create six-minute songs

Stability Audio 3.0 small model can run on-device and generate two-minute long tracks

21
r/LocalLLaMA community 1mo ago

How accurate can “whichllm” be?

Hello people I think the question is clear but I wanted to add some context: I work on internal tools in my job and some of the tools are for us developers (most tools are for marketing and factory production). I am currently working on a small cli tool that uses a local model…

12
r/LocalLLaMA community 1mo ago

what non-coding tasks have you gotten a local model to do autonomously?

coding agents are everywhere right now but i'm more interested in models that actually take actions autonomously. we built a small vlm for desktop gui automation. i mostly use it for moving data between apps that don't have apis, saves me a lot of copy pasting. still kinda janky…

11
r/LocalLLaMA community 1mo ago

Audio upscaling, cleanup, or improvement models?

I never see this type of model talked about. Are there many open models in the category? I do a lot of audio cleanup and end up using auphonic but would like to be using a local model. Edit: e.g like voice recovery, reverb removal, auto-EQ type stuff   submitted by  …

5
arXiv — Machine Learning research 1mo ago

R2V Agent: Teaching SLMs When to Ask for Help

arXiv:2605.16604v1 Announce Type: new Abstract: Efficient agentic systems should incur expensive frontier-model costs only on decisions where a cheaper local model is likely to fail. Existing LLM cascades usually route whole queries before execution, but task difficulty shifts…

18
arXiv — NLP / Computation & Language research 1mo ago

Language Acquisition Device in Large Language Models

arXiv:2605.16758v1 Announce Type: new Abstract: Large Language Models (LLMs) remain substantially less data-efficient than humans. Pre-pretraining (PPT) on synthetic languages has been proposed to close this gap, with prior work emphasizing highly expressive formal languages…

32
arXiv — NLP / Computation & Language research 1mo ago

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

arXiv:2605.18271v1 Announce Type: new Abstract: With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature…

16
r/LocalLLaMA community 1mo ago

What’s your current local LLM setup in 2026?

Hey all — I’ve been trying to get a better sense of what people are actually running locally these days. Curious about your setup: GPU (or CPU if you’re brave ) RAM / VRAM Models you use the most Main use case (coding, chat, agents, etc.) Also — what’s the biggest bottleneck…

24
r/LocalLLaMA community 1mo ago

club-5060ti follow-up: cleaner RTX 5060 Ti local LLM recipes, benchmark explorer, and CUDA GPU compatibility notes

I posted earlier about RTX 5060 Ti local LLM testing, and I have cleaned the repo up quite a bit since then. The project is now a more structured benchmark/recipe repo rather than scattered notes. It has a static results explorer, schema-validated benchmark JSON, clearer…

34
Zed Editor dev-tools 1mo ago

Why and How to Run Local Models in Zed

You can run local AI models in Zed to get better performance and control over your data. Here's how.

33
r/LocalLLaMA community 1mo ago

favorite Agentic Coding Harness

So far, I’ve tried Codex CLI, Claude Code, Gemini CLI, OpenCode, and recently, Pi with local models. Pi is the leanest of them all, with just four tools: read, write, edit, and bash. Its system prompt is only under 2K tokens, and it's perfect for local models. I've been trying…

29
The Information — AI news-outlet 1mo ago

Edge Inference Chip Startup SiMa.ai Raising at $1.4 Billion Valuation

Nvidia might be on a tear, but some investors are still convinced that there’s demand for another kind of specialized chips. And they’re putting their money where their mouth is. For example: San Jose, Calif.-based SiMa.ai , which develops chips that work on devices such as…

14
r/LocalLLaMA community 1mo ago

What happens to local LLM if/when LLMs are no longer released for free?

I’m thinking about where this might wind up in 3-5+ years. As others have noted there’s no guarantee that Qwen, Google, and others will continue to release models in the future. Suppose the supply of new LLM models dries up overnight. Whatever is available today, May 2026, is…

6
r/LocalLLaMA community 1mo ago

Is anyone prioritizing code quality checks via a small local model?

Sorry if the title is confusing. What I'm trying to say is that since coding agents can write a lot of code very quickly and it can kinda get messy overtime if unchecked frequently. Shouldn't there be a tiny local model with a TESTING(dot)md or a QUALITY(dot)md which describes…

14
r/LocalLLaMA community 1mo ago

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I…

12
r/LocalLLaMA community 1mo ago

The power of structured workflows and small local models

A month ago, I experimented with a very basic home-rolled agent loop with a handful of tools and found it worked surprisingly well in spite of how crude it was: https://www.reddit.com/r/LocalLLaMA/comments/1sl7f8e/homerolled_loop_agent_is_surprisingly_effective/ Later, I wrote…

15
r/LocalLLaMA community 1mo ago

Made a simple template manager and GUI for llama.cpp so I don't have to keep memorizing CLI flags.

Introducing Hexllama Hey, I’ve always found llama-server to be more than enough for testing out local models, mostly because it guarantees you always have the absolute latest llama.cpp features and architecture support. But keeping track of different CLI commands, context sizes,…

19
r/LocalLLaMA community 1mo ago

Using Local LLMs for research

Hey there. I am an undergrad who has been doing mostly SWE, but will be doing ML research under my professor over the summer. So I am new to research - I ask not to be judged too harshly. Generally, we will be working on Physics-Informed Neural Networks. I have seen some…

9
r/LocalLLaMA community 1mo ago

LLM Phone Home: Reliable Apps that can deliver inference from local backend

Hello all, I’m wondering what suggestions there are for an ios app that can serve an openai compatible endpoint. I am using 3sparks which works GREAT for that specific use, BUT, there is no mcp, no web search, etc. I want to show people that a local model with web search on your…

25
r/LocalLLaMA community 1mo ago

What’s are the best abliterated or uncensored local models that allow financial advice-related questions?

Not trying to get rich quick or anything, but I’m just tired of models refusing to answer questions related to their opinions on money matters or having them be wishy-washy about financial decision making advice. Seems like this can be a blocker with both frontier closed source…

32
r/LocalLLaMA community 1mo ago

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED

One thing missing when running local models as agents: real, current data. So I built Equibles — a self-hosted MCP server that scrapes and serves public U.S. financial data and exposes it as MCP tools, so any MCP-capable client (Claude Code/Desktop, Cursor, or your own…

30
r/LocalLLaMA community 1mo ago

how would you set up a local llm server for a business of 7 people?

Okay so i've been stalking this sub for some time and i run the occasional small 2-8b model on my laptop (not the best) for fun but say my role at a company is to set up a local LLM since we obviously don't want confidential data going to other companies etc / main use case…

16
r/LocalLLaMA community 1mo ago

Are the rich RAM /poor GPU people wrong here?

Hello Guys, I know everyone has his definition of local models, but for me i see 2 "reasonable" type of frontier local models. a dense one that barely fit in a 32GB ou 24GB of gpu for the most "reasonable" GPU wealthy guys and a MOE in the 100B params, the 100ish B billion…

21
r/LocalLLaMA community 1mo ago

Gemma 4 + LiteRT-LM on mobile: much better memory/perf than my llama.cpp setup

Hi r/LocalLLaMA - I've been paying close attention to the edge AI ecosystem because it's an area where i see huge potential and where I truly believe AI will become more useful for day to day tasks. Around the gemma 4 release I was already experimenting with local AI but the…

18
Hacker News — AI on Front Page community 1mo ago

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

Article URL: https://github.com/Andyyyy64/whichllm Comments URL: https://news.ycombinator.com/item?id=48146369 Points: 224 # Comments: 38

21
r/LocalLLaMA community 1mo ago

What is the most unexpected thing you have gotten a local model to do?

Most local LLM use cases I see are chat, coding, and RAG. But with vision models getting better and faster on consumer hardware, I feel like there is a lot of untapped territory. I got a local VLM to play a board game by just looking at the screen and it worked way better than I…

25
r/LocalLLaMA community 1mo ago

Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)

In my opinion, MTP models are 100% game changer for local LLMs. In terms of speed, I was getting around 1.5x the tok/sec of previous tests. The project was a test - building a full iterative step-by-step pygame; a small mystery dungeon-style game. At first I set 100-200k context…

28
arXiv — Machine Learning research 1mo ago

Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization

arXiv:2605.14373v1 Announce Type: new Abstract: Zeroth-Order (ZO) optimization is pivotal for scenarios where backpropagation is unavailable, such as memory-constrained on-device learning and black-box optimization. However, existing methods face a stark trade-off: they are…

7
r/LocalLLaMA community 1mo ago

club-5060ti: practical RTX 5060 Ti local LLM notes and configs

I put together a small public repo for RTX 5060 Ti 16GB local LLM setups: I took inspiration from the club-3090 repo, but this one is focused on documenting what we’ve actually tested on 5060 Ti hardware so the setup details are easier to share and reproduce. Current seed setup…

6
r/LocalLLaMA community 1mo ago

A VERY lightweight open web-search tool for smaller local LLMs

Hey everyone, Been playing around with local agent setups lately, mostly Cline/Roo with smaller models, and web search kept annoying me. Not because it doesn’t work, but because it usually throws way too much random page text into the context. small models really don’t handle…

29
r/LocalLLaMA community 1mo ago

Got local Qwen 3.5/3.6 generating meeting summaries entirely offline on an M4 Max. Demo with Wi-Fi off. This is the future.

I'm the founder behind Hedy, an AI meeting app. I'm a huge supporter of Local AI, and we've been working on making it "consumer friendly". Speech recognition in Hedy has always run on-device (whisper.cpp and now also parakeet). What just shipped is that the rest of the AI…

22
r/LocalLLaMA community 1mo ago

Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup?

So I've been going down a rabbit hole lately and I can't find many people actually talking about this specific use case. everyone here runs local LLMs for coding, chat, maybe some creative writing. cool. But what about using it as a proper personal knowledge base? like, dump…

24
r/LocalLLaMA community 1mo ago

The "the future is fictional" problem of many local LLMs

Many local models have a problem (that raised due to excessive RHLF training): They mostly think that everything that is beyond their knowledge cutoff date would be "fictional" or "satirical". To be fair: Even the Gemini API without web access can have this sometimes. But it…

20
r/MachineLearning community 1mo ago

Your AI Use Is Breaking My Brain: Why 10 Minutes of Prompting Fries Us[D]

It’s 2:30 AM. My youngest just woke up crying for water, completely derailing my train of thought while I was trying to debug a weird edge case in a side project. I stared at my IDE, then at my local model running in the terminal, then back at the IDE. My brain felt like…

26
r/LocalLLaMA community 1mo ago

Small local model for questions on German grammar

I'm trying to learn German. I use Qwen3.5/3.6 locally, but this is pretty bad for German grammar. Has anyone got a recommendation for a small-ish local model that knows German grammer well and can answer questions on this? EDIT: I give an example output from unquantized Qwen3.5…

38
arXiv — Machine Learning research 1mo ago

A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions

arXiv:2605.11010v1 Announce Type: new Abstract: Federated Learning has emerged as a transformative paradigm for collaborative machine learning across distributed environments. However, its performance is strongly influenced by the aggregation strategy used to combine local model…

17
r/LocalLLaMA community 1mo ago

I've seen a lot of folks ask "can local LLMs actually do anything useful?"

And I'm here to share my experience. The answer is resoundingly 'yes'. Let me start with the local model I use every day in my AI harness: embedding models. I'm using an embedding model to give my AI's persistent memory system a semantic search protocol that makes its memory…

37
r/LocalLLaMA community 1mo ago

Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM

Today I set up a full coding toolbox on a single RTX 5080 (with RAM offloading) that's actually viable. Autocomplete : bartowski/Qwen2.5-Coder-7B-Instruct-GGUF:Q6_K_L Agentic : unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL Why these models: Qwen2.5 is still the best model for infill…

9
Smol AI News news-outlet 2mo ago

not much happened today

**Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including…

35
NVIDIA Developer Blog official-blog 2mo ago

Bringing AI Closer to the Edge and On-Device with Gemma 4

The Gemmaverse expands with the launch of the latest Gemma 4 multimodal and multilingual models, designed to scale across the full spectrum of deployments, from...

27
Hugging Face official-blog 2mo ago

Welcome Gemma 4: Frontier multimodal intelligence on device

Back to Articles Welcome Gemma 4: Frontier multimodal intelligence on device Published April 2, 2026 Update on GitHub Upvote 891 merve merve Pedro Cuenca pcuenq Sergio Paniego sergiopaniego ben burtenshaw burtenshaw Steven Zheng Steveeeeeeen Alvaro Bartolome alvarobartt Nathan…

9
NVIDIA Developer Blog official-blog 3mo ago

NVIDIA IGX Thor Powers Industrial, Medical, and Robotics Edge AI Applications

Industrial and medical systems are rapidly increasing the use of high-performance AI to improve worker productivity, human-machine interaction, and downtime...

13
NVIDIA Developer Blog official-blog 3mo ago

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features

CUDA 13.2 arrives with a major update: NVIDIA CUDA Tile is now supported on devices of compute capability 8.X architectures (NVIDIA Ampere and NVIDIA Ada), as...

18
Import AI news-outlet 3mo ago

Import AI 448: AI R&D; Bytedance's CUDA-writing agent; on-device satellite AI

If Ukraine is the first major drone war, when will there be the first major AI war?

6
NVIDIA Developer Blog official-blog 3mo ago

How to Minimize Game Runtime Inference Costs with Coding Agents

NVIDIA ACE is a suite of technologies for building AI agents for gaming. ACE provides ready-to-integrate cloud and on-device AI models for every part of in-game...

23
Google DeepMind official-blog 12mo ago

Gemini Robotics On-Device brings AI to local robotic devices

We’re introducing an efficient, on-device robotics model with general-purpose dexterity and fast task adaptation.

33
Google DeepMind official-blog 13mo ago

Announcing Gemma 3n preview: Powerful, efficient, mobile-first AI

Gemma 3n is a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio, empowering developers to build live, interactive applications and…

23

Stability AI releases a new audio model that can create six-minute songs

How accurate can “whichllm” be?

what non-coding tasks have you gotten a local model to do autonomously?

Audio upscaling, cleanup, or improvement models?

R2V Agent: Teaching SLMs When to Ask for Help

Language Acquisition Device in Large Language Models

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

What’s your current local LLM setup in 2026?

club-5060ti follow-up: cleaner RTX 5060 Ti local LLM recipes, benchmark explorer, and CUDA GPU compatibility notes

Why and How to Run Local Models in Zed

favorite Agentic Coding Harness

Edge Inference Chip Startup SiMa.ai Raising at $1.4 Billion Valuation

What happens to local LLM if/when LLMs are no longer released for free?

Is anyone prioritizing code quality checks via a small local model?

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

The power of structured workflows and small local models

Made a simple template manager and GUI for llama.cpp so I don't have to keep memorizing CLI flags.

Using Local LLMs for research

LLM Phone Home: Reliable Apps that can deliver inference from local backend

What’s are the best abliterated or uncensored local models that allow financial advice-related questions?

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED

how would you set up a local llm server for a business of 7 people?

Are the rich RAM /poor GPU people wrong here?

Gemma 4 + LiteRT-LM on mobile: much better memory/perf than my llama.cpp setup

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

What is the most unexpected thing you have gotten a local model to do?

Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)

Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization

club-5060ti: practical RTX 5060 Ti local LLM notes and configs

A VERY lightweight open web-search tool for smaller local LLMs

Got local Qwen 3.5/3.6 generating meeting summaries entirely offline on an M4 Max. Demo with Wi-Fi off. This is the future.

Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup?

The "the future is fictional" problem of many local LLMs

Your AI Use Is Breaking My Brain: Why 10 Minutes of Prompting Fries Us[D]

Small local model for questions on German grammar

A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions

I've seen a lot of folks ask "can local LLMs actually do anything useful?"

Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM

not much happened today

Bringing AI Closer to the Edge and On-Device with Gemma 4

Welcome Gemma 4: Frontier multimodal intelligence on device

NVIDIA IGX Thor Powers Industrial, Medical, and Robotics Edge AI Applications

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features

Import AI 448: AI R&D; Bytedance's CUDA-writing agent; on-device satellite AI

How to Minimize Game Runtime Inference Costs with Coding Agents

Gemini Robotics On-Device brings AI to local robotic devices

Announcing Gemma 3n preview: Powerful, efficient, mobile-first AI