r/LocalLLaMA · · 3 min read

Need Help Choosing a Harness for Qwen 3.6 27B

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I've burned a week trying to customize my agent manually - building my own front end - but I've gotten to the point where I'm just exhausted and willing to try a harness, but need the right one. I read posts all the time, but I have a specific use case, so I'm reaching out to the best of the best for suggestions.

Here is my stack:

  • Windows 10 | i7 12700K | RTX 3090 TI | 96GB RAM
  • Models: Qwen 3.5|3.6 27B UD K XL (Q4/Q5) - Also will be using 0.8B/4B in CPU parallel
  • Server: LM Studio
  • Apps: (in Docker) N8N, Redis (w/redisstack,redisinsight), Postgres (w/pgadmin,pgvector), Dify (installed, never used), browserless (never used)

Where I am right now:

I'm using LM Studio because it just works. I tried llama.cpp w/openwebui and rage quit, was just slower and not same features I'm used to. Cass - my agent - works fine at Q5, but fills up context fast because o/mcp. (I know, I know) To help out, I switch to Q4 @ Q4 KV to get up to 200K and it works surprisingly well, but I figured if I spawn sub-agents I can pass that mcp context to them and just respawn for new tasks.

I had Cass write an agent spawner and it works fine. The trick works - the mcp context hits the subs and I can chat w/Cass longer - but I can't see what the sub-agent is doing/thinking/etc. I had cass build a dashboard for sub-agents that sorta worked, but there were just...issues. Cass couldn't see the agent's stream until it was finished and sometimes thought it timed out when the sub was still working. I searched and figured I'd have the sub stream its output to cass, but to properly see all this, I figured I'd need a custom front end.

Additionally, I want to run a process in parallel via cpu - a meta analysis agent - and I need a way to monitor its outputs as well. So, we're talking at minimum 2 agent outputs (main, meta) and then a third during spawn.

I watched some vidz last night about pi agent. I'm not sure this is what I need - I want to use mcp tools. But I'm good using other tools as long as I can still read/write to redis and postgres.

Also, I want to add a small agent that intercepts incoming chats and injects memories/context/etc (I'll set this manually) prior to the main agent getting the message. A sort of prefill context packet.

What I need is a harness that enables the following:

  • Super simple gui (heck, even a terminal look like pi agent is fine I guess). I need to see current ctx size, max ctx size, and all tools. Needs to work w/images too.
  • Allows me to spawn sub-agents easily, set their individual system prompts, and choose their mcp tools.
  • Allows me a dashboard or monitor where I can view ALL of their outputs - thinking, tool use, etc.
  • A simple way to wire smaller agents' output to the main agent for "prefill". I read about redis agent memory server, but I want something that allows me to set up what type of data the smaller model transfers downstream.

What's the simplest open source harness that will allow this? I'm not interested in any cloud models, only local and what can fit in my gpu. I'm happy w/my current agent, but I need some minor automation and management tools that I really don't have time to build myself.

Thanks in advance for any suggestions.

submitted by /u/GrungeWerX
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA