r/LocalLLaMA · May 17, 2026 · 1 min read

The power of structured workflows and small local models

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

The power of structured workflows and small local models

A month ago, I experimented with a very basic home-rolled agent loop with a handful of tools and found it worked surprisingly well in spite of how crude it was:

https://www.reddit.com/r/LocalLLaMA/comments/1sl7f8e/homerolled_loop_agent_is_surprisingly_effective/

Later, I wrote about how I addictive developing your own agent loop is, esp. once you reach the point that the agent loop is capable of editing itself:

https://www.reddit.com/r/LocalLLaMA/comments/1sq7cie/warning_do_not_write_your_own_ai_agent_if_you/

Well, 28 days later, it's been getting out of hand. I've been working until 5am on it as it was so addictive.

Once you have a good agentic setup, you quickly realise that you, as the human, is the main bottleneck. You have a massive todo list, but the agent is sitting idle, waiting your your approvals and reviews.

Not only that, since I am using Qwen3.5 9B as the model, the model has limited intelligence and context. I can't just dump hundreds of data files onto it and expect it to crunch it all at the same time, so then I thought to manage the context limits through a map-reduce pattern, breaking tasks down into smaller chunks that can be run in parallel to extract maximum FLOPs out of the GPU while staying within context limits.

Enforcing structured outputs also helps to reduce LLM variability and make a smooth reduce step.

Lastly, it is helpful to have a database to monitor and track workflows. Managed to get it up and running today and happy that small local models can handle this task.

My custom agent has now replace Claude Code for 99% of tasks.

submitted by /u/DeltaSqueezer
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA