Pinpointing the Culprit: Automated Failure Attribution in LLM Multi-Agent Systems

Introduction

LLM-powered multi-agent systems are increasingly deployed to tackle complex tasks by distributing work among specialized agents. However, when such systems fail—despite a flurry of activity—developers face a daunting question: which agent caused the failure, and at what point did it happen? Traditionally, diagnosing failures requires painstakingly sifting through extensive interaction logs, a process akin to finding a needle in a haystack. This manual approach is time-consuming and relies heavily on developer expertise, hindering rapid iteration and optimization.

Pinpointing the Culprit: Automated Failure Attribution in LLM Multi-Agent Systems — Source: syncedreview.com

To address this challenge, researchers from Penn State University and Duke University—in collaboration with Google DeepMind, University of Washington, Meta, Nanyang Technological University, and Oregon State University—have introduced the novel problem of Automated Failure Attribution. Their work, accepted as a Spotlight presentation at ICML 2025, provides the first benchmark dataset (Who&When) and develops several automated attribution methods. This article explores the background, methodology, and implications of this groundbreaking research.

The Debugging Bottleneck in Multi-Agent Systems

LLM-based multi-agent systems show immense promise across domains like software development, research, and decision-making. Yet they remain fragile: a single agent's error, a misunderstanding between agents, or a mistake in information transmission can derail the entire task. When failures occur, developers currently rely on manual techniques:

Manual Log Archaeology – Developers must read through lengthy, unstructured interaction logs to locate the root cause.
Reliance on Expertise – Effective debugging demands deep understanding of the system's design and agent interactions.

These inefficiencies create a critical bottleneck. Without automated tools, system improvement slows, and the potential of multi-agent architectures remains untapped.

The Who&When Dataset: A Foundation for Attribution

To enable automated failure attribution, the team constructed the Who&When dataset—the first benchmark specifically designed for this task. The dataset comprises numerous multi-agent interaction traces where tasks either succeed or fail. Each failure is annotated with the responsible agent and the timestep of the error. Key features include:

Diverse scenarios – Covering multiple domains and agent configurations.
Ground-truth labels – Manually verified by researchers to ensure reliability.
Open access – The dataset is publicly available on Hugging Face for community use.

This resource provides a standardized testbed for evaluating attribution methods, enabling fair comparisons and accelerating progress.

Automated Attribution Methods: From Baselines to Advanced

The team developed and evaluated several automated attribution approaches, ranging from simple heuristics to sophisticated LLM-based reasoning. The methods can be categorized as:

Rule-based baselines – Using predefined patterns (e.g., last agent to act, longest message) as naive predictors.
LLM-based classifiers – Fine-tuning large language models to analyze logs and output the faulty agent and timestep.
Causal chain analysis – Tracing the flow of information and decisions to identify points of failure.
Multi-stage reasoning – Combining LLM outputs with structured reasoning to improve accuracy.

Results showed that LLM-based methods significantly outperform baselines, but the task remains challenging—especially for subtle errors that propagate through long interaction chains.

Key Findings and Insights

The study revealed several important findings:

Automated attribution is feasible – Even simple LLM-based methods achieve non-trivial accuracy, demonstrating the potential for practical tools.
Timing matters – Accurately identifying the timestep of failure is often harder than pinpointing the agent; many errors are not immediately visible.
Complex errors require context – Failures stemming from inter-agent miscommunication or delayed effects demand holistic analysis.
Dataset biases exist – Certain agents or interaction patterns are overrepresented, highlighting the need for balanced data.

These insights guide future research toward more robust attribution systems.

Implications for Multi-Agent System Development

Automated failure attribution promises to transform how developers debug and iterate on multi-agent systems. By quickly pointing to the responsible agent and timestep, it enables:

Faster debugging cycles – Reducing manual log review from hours to minutes.
Targeted improvements – Developers can focus on the faulty component rather than guessing.
Enhanced reliability – Systematic attribution supports continuous integration and deployment of multi-agent solutions.

The open-source release of code and data further democratizes access, allowing the broader AI community to build upon this foundation.

Conclusion

The research from Penn State, Duke, and collaborators marks a significant step toward making LLM multi-agent systems more transparent and easier to debug. By defining the problem of automated failure attribution and providing the Who&When dataset, they have opened a new research direction. As multi-agent architectures grow in complexity, tools like these will be essential for maintaining and improving system performance. The Spotlight acceptance at ICML 2025 underscores the importance of this work—and the community eagerly awaits further advances.

For more details, read the full paper: arXiv and access the dataset on Hugging Face.

Tags: