r/LocalLLaMA · May 27, 2026 · 1 min read

How Qwen3.6-35B-A3B fails differently as a sub agent compared to solo

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Been running Qwen3.6-35B-A3B as a sub agent on a single 4090 for a few weeks. The failure modes are different from solo use and I haven't seen this written up anywhere.

Solo use, you notice drift fast. The model produces something confused, you see it, you can fix it. When it's a sub agent receiving tasks from an orchestrator, the orchestrator treats a confused or partial response the same as a legitimate one unless you've explicitly built a validation layer. Most of us don't. The confident format passes through and the bad output goes downstream.

The specific pattern I keep hitting: the model processes the task in thinking mode, produces something that looks structurally correct, and the orchestrator accepts it. Wrong content, right format, no flag.

MoE architecture makes this harder to predict than a dense model. Sparsity means certain task types hit cold experts and performance drops significantly without any signal that it happened. At the hardware level on a single consumer GPU the variance between task types is real.

What's your harness setup for catching sub agent output degradation at this scale? Not the orchestrator choice, the validation layer specifically.

submitted by /u/Substantial_Step_351
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA