AgentOrchestra: Hierarchical Multi-Agent Orchestration with the TEA Protocol

14 Jun 2025 on Llm agents, Multi-agent systems, Benchmarks

We propose the Tool-Environment-Agent (TEA) protocol and AgentOrchestra, a hierarchical multi-agent framework that achieves 89.04% on GAIA — state-of-the-art on general-purpose agent benchmarks.

We introduce AgentOrchestra, a hierarchical multi-agent framework built on the Tool-Environment-Agent (TEA) protocol, achieving 89.04% on GAIA — establishing state-of-the-art performance on general-purpose agent benchmarks.

The Problem with Existing Multi-Agent Systems

Current LLM-based multi-agent systems suffer from a common structural weakness: environments, agents, and tools are treated as second-class citizens — wired together in ad-hoc, brittle ways with no formal lifecycle management or versioned interfaces. This makes systems fragile, hard to debug, and difficult to scale.

The TEA Protocol

The Tool-Environment-Agent (TEA) protocol addresses this by treating all three components as first-class resources with:

Explicit lifecycles — resources are created, activated, suspended, and terminated through well-defined state machines
Versioned interfaces — every resource exposes a typed, versioned contract, enabling safe composition and substitution
Unified resource substrate — a shared layer that manages resource registration, discovery, and communication

This design directly parallels lessons from operating systems and microservice architectures: stable, composable abstractions are the foundation of reliable complex systems.

AgentOrchestra Architecture

AgentOrchestra builds on TEA with a hierarchical structure:

Central Planner — decomposes complex tasks into sub-goals and dynamically assigns them to specialist sub-agents
Specialist Sub-Agents — independently handle focused domains: web navigation, data analysis, file operations, code execution
Resource Manager — enforces TEA protocol contracts, handles agent spawning/teardown, and monitors resource health

The central planner operates at the task level; sub-agents operate at the action level. This separation of concerns enables parallelism while maintaining coherent task-level reasoning.

GAIA Benchmark Results

GAIA is a challenging benchmark requiring real-world tool use, multi-step reasoning, and web interaction across hundreds of diverse tasks.

System	GAIA Score
AgentOrchestra	89.04%
Previous SOTA	< 89%

AgentOrchestra achieves state-of-the-art performance, demonstrating that principled protocol design — not just model scale — is a key driver of agent capability.

AgentOrchestra: Hierarchical Multi-Agent Orchestration with the TEA Protocol

The Problem with Existing Multi-Agent Systems

The TEA Protocol

AgentOrchestra Architecture

GAIA Benchmark Results

Links

Wentao Zhang

Error

The Problem with Existing Multi-Agent Systems

The TEA Protocol

AgentOrchestra Architecture

GAIA Benchmark Results

Links

Templates:

Error