r/LocalLLaMA · May 25, 2026 · 1 min read

Want Built a React-style looping agent with small LLMs (Qwen 3.5 9B / Gemma4) + LangGraph?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Currently experimenting with building a React-style looping agent system using small LLMs like Qwen 3.5 9B and Gemma 4 (E2B), and I wanted to ask if anyone here has worked on something similar.

Current setup:

Using LangGraph
Around 5 tools available to the agent
Input includes both instructions and images
Agent runs in a loop where one tool’s output may become another tool’s input
Planning to later extend this into a multi-agent system with 2 subagents

Right now I’m only testing a single-agent workflow before moving to multi-agent orchestration.

The main issue I’m facing:

Qwen 9B starts generating huge amounts of thinking/reasoning tokens during loops
Sometimes the output never properly returns or gets truncated
Recursive/react loops become unstable after a few iterations

I’m trying to understand:

How people usually control tool-calling loops with smaller models
Whether I should limit reasoning depth / iterations
Better patterns for tool dependency handling in LangGraph
Whether planner/executor separation is necessary even for small systems
If there are known strategies to reduce unnecessary “thinking token” generation in Qwen

Would really appreciate:

Architecture suggestions
Open-source repos/examples
Best practices for LangGraph recursive agents
Tips for making small models stable in tool loops

submitted by /u/siri_1110
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA