r/MachineLearning · · 2 min read

CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution [R]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

LLM-based multi-agent systems have demonstrated strong performance across complex real-world tasks, such as software engineering, predictive modeling, and retrieval-augmented generation. Yet, automating their configuration remains a structural challenge. Researchers are often forced into manual, trial-and-error prompt tuning, where a change to a single agent shifts the global output in ways that are difficult to trace.

The core bottleneck is credit assignment: while the parameters governing agent behavior are local, performance scores are only available at the global system level. This makes optimization fundamentally difficult because we do not inherently know which agents contributed positively or negatively to the outcome.

CANTANTE is an attempt to take a different path: treating agent prompts as parameters learned from task rewards rather than tuned by hand. By solving the credit assignment problem, we can move from brittle, hand-crafted agent demos to trustworthy systems that are actually autonomous and useful in practice.

CANTANTE's algorithm in short (see second image):

  1. Let local optimizers suggest configurations (e.g., prompts).
  2. Evaluate different configurations on the same queries, capturing reasoning traces and system scores.
  3. Let an attributer compare these rollouts and assign each agent a credit, thereby decomposing the global reward into per-agent update signals.
  4. Feed those credits to any local optimizer; for the experiments, we use CAPO, our prompt optimizer from prior work at AutoML 2025.

Evaluated against the DSPy-solutions GEPA and MIPROv2 on MBPP (Programming Benchmark), GSM8K (Mathematical Reasoning Benchmark), and HotpotQA (Retrieval Benchmark), CANTANTE:

• Achieves the best average rank,

• beats the strongest baseline by +18.9 points on MBPP and +12.5 on GSM8K, and

• maintains inference time cost compared to unoptimized prompts.

🔗 Link to the paper: https://arxiv.org/abs/2605.13295

💻 Link to the repo: https://github.com/finitearth/cantante

If you're researching multi-agent architectures or automated prompt engineering, I'd love to hear what's working (and breaking) for you right now.

submitted by /u/finitearth
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning