r/LocalLLaMA · · 1 min read

Want Built a React-style looping agent with small LLMs (Qwen 3.5 9B / Gemma4) + LangGraph?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Currently experimenting with building a React-style looping agent system using small LLMs like Qwen 3.5 9B and Gemma 4 (E2B), and I wanted to ask if anyone here has worked on something similar.

Current setup:

  • Using LangGraph
  • Around 5 tools available to the agent
  • Input includes both instructions and images
  • Agent runs in a loop where one tool’s output may become another tool’s input
  • Planning to later extend this into a multi-agent system with 2 subagents

Right now I’m only testing a single-agent workflow before moving to multi-agent orchestration.

The main issue I’m facing:

  • Qwen 9B starts generating huge amounts of thinking/reasoning tokens during loops
  • Sometimes the output never properly returns or gets truncated
  • Recursive/react loops become unstable after a few iterations

I’m trying to understand:

  • How people usually control tool-calling loops with smaller models
  • Whether I should limit reasoning depth / iterations
  • Better patterns for tool dependency handling in LangGraph
  • Whether planner/executor separation is necessary even for small systems
  • If there are known strategies to reduce unnecessary “thinking token” generation in Qwen

Would really appreciate:

  • Architecture suggestions
  • Open-source repos/examples
  • Best practices for LangGraph recursive agents
  • Tips for making small models stable in tool loops
submitted by /u/siri_1110
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA