AgentOrchestra: Hierarchical Multi-Agent Orchestration with the TEA Protocol

We propose the Tool-Environment-Agent (TEA) protocol and AgentOrchestra, a hierarchical multi-agent framework that achieves 89.04% on GAIA — state-of-the-art on general-purpose agent benchmarks.

We introduce AgentOrchestra, a hierarchical multi-agent framework built on the Tool-Environment-Agent (TEA) protocol, achieving 89.04% on GAIA — establishing state-of-the-art performance on general-purpose agent benchmarks.

The Problem with Existing Multi-Agent Systems

Current LLM-based multi-agent systems suffer from a common structural weakness: environments, agents, and tools are treated as second-class citizens — wired together in ad-hoc, brittle ways with no formal lifecycle management or versioned interfaces. This makes systems fragile, hard to debug, and difficult to scale.

The TEA Protocol

The Tool-Environment-Agent (TEA) protocol addresses this by treating all three components as first-class resources with:

  • Explicit lifecycles — resources are created, activated, suspended, and terminated through well-defined state machines
  • Versioned interfaces — every resource exposes a typed, versioned contract, enabling safe composition and substitution
  • Unified resource substrate — a shared layer that manages resource registration, discovery, and communication

This design directly parallels lessons from operating systems and microservice architectures: stable, composable abstractions are the foundation of reliable complex systems.

AgentOrchestra Architecture

AgentOrchestra builds on TEA with a hierarchical structure:

  • Central Planner — decomposes complex tasks into sub-goals and dynamically assigns them to specialist sub-agents
  • Specialist Sub-Agents — independently handle focused domains: web navigation, data analysis, file operations, code execution
  • Resource Manager — enforces TEA protocol contracts, handles agent spawning/teardown, and monitors resource health

The central planner operates at the task level; sub-agents operate at the action level. This separation of concerns enables parallelism while maintaining coherent task-level reasoning.

GAIA Benchmark Results

GAIA is a challenging benchmark requiring real-world tool use, multi-step reasoning, and web interaction across hundreds of diverse tasks.

SystemGAIA Score
AgentOrchestra89.04%
Previous SOTA< 89%

AgentOrchestra achieves state-of-the-art performance, demonstrating that principled protocol design — not just model scale — is a key driver of agent capability.


© 2026. Wentao Zhang. All rights reserved.

Powered by Hydejack v9.2.1