Want Built a React-style looping agent with small LLMs (Qwen 3.5 9B / Gemma4) + LangGraph?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Currently experimenting with building a React-style looping agent system using small LLMs like Qwen 3.5 9B and Gemma 4 (E2B), and I wanted to ask if anyone here has worked on something similar.
Current setup:
- Using LangGraph
- Around 5 tools available to the agent
- Input includes both instructions and images
- Agent runs in a loop where one tool’s output may become another tool’s input
- Planning to later extend this into a multi-agent system with 2 subagents
Right now I’m only testing a single-agent workflow before moving to multi-agent orchestration.
The main issue I’m facing:
- Qwen 9B starts generating huge amounts of thinking/reasoning tokens during loops
- Sometimes the output never properly returns or gets truncated
- Recursive/react loops become unstable after a few iterations
I’m trying to understand:
- How people usually control tool-calling loops with smaller models
- Whether I should limit reasoning depth / iterations
- Better patterns for tool dependency handling in LangGraph
- Whether planner/executor separation is necessary even for small systems
- If there are known strategies to reduce unnecessary “thinking token” generation in Qwen
Would really appreciate:
- Architecture suggestions
- Open-source repos/examples
- Best practices for LangGraph recursive agents
- Tips for making small models stable in tool loops
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.