Research

Wentao Zhang’s research on LLM Agents, Self-Evolving Agents, and Financial AI.

Research

My research sits at the intersection of LLM-powered autonomous agents and Financial AI (AI4Finance). A central theme is agent self-evolution — building systems that continuously improve themselves through closed-loop experience, resource versioning, and protocol-level self-modification.

Research Themes

Self-Evolving Agents — Protocols and architectures enabling agents to evolve their own prompts, tools, memory, and sub-agents autonomously
LLM Agents & Multi-Agent Orchestration — Hierarchical frameworks, tool use, long-horizon planning, and standardized agent communication protocols
General Computer Control — Foundation agents that operate arbitrary software using only pixels and natural language
AI4Finance — End-to-end financial platforms, algorithmic trading, portfolio management, and HFT via RL and LLMs
Reinforcement Learning — Deep RL for sequential decision-making in complex, partially observable environments

Featured Projects

Featured · Self-Evolution

Autogenesis: A Self-Evolving Agent Protocol

Autogenesis addresses a fundamental limitation of current LLM agent systems: they are static — prompts, tools, and behaviors fixed at design time cannot improve from experience.

Two tightly coupled layers power the system:

Resource Substrate Protocol Layer — models prompts, agents, tools, and memory as versioned resources with explicit lifecycles, enabling safe mutation and rollback
Self-Evolution Protocol Layer — a closed-loop system where the agent monitors its own performance, identifies failure modes, and autonomously rewrites its own resources to improve

The result is an agent that gets measurably better at complex planning and tool-use tasks through runtime self-modification — without human intervention.

Paper GitHub

Featured · ICML 2025

Cradle: Empowering Foundation Agents towards General Computer Control

How do you build an agent that can use any computer software — without task-specific APIs or hand-coded integrations?

Cradle's answer: treat the screen as the universal interface. Agents receive screenshots as input and produce keyboard and mouse actions as output — exactly how humans interact with computers. This unlocks operation across arbitrary software: games (Red Dead Redemption 2, Stardew Valley, Cities: Skylines), browsers, email clients, and creative tools — all with the same agent.

Cradle also incorporates self-improvement: agents curate and refine a skill library from past experience, enabling progressive capability growth on new tasks.

⭐ 2.5k GitHub stars

Paper GitHub Project Page

Multi-Agent · GAIA SOTA

AgentOrchestra: Hierarchical Multi-Agent Orchestration with the TEA Protocol

The Tool-Environment-Agent (TEA) protocol treats environments, agents, and tools as first-class resources with explicit lifecycles and versioned interfaces — solving the fragile, ad-hoc wiring that plagues most multi-agent systems.

AgentOrchestra builds on TEA with a central planner that dynamically spawns and coordinates specialist sub-agents (web navigation, data analysis, file operations). It achieves 89.04% on GAIA, establishing state-of-the-art performance on general-purpose agent benchmarks.

Paper

KDD 2024

FinAgent: A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist

A tool-augmented, diversified, and generalist multimodal agent for financial trading that integrates heterogeneous data sources (price, news, filings) and diverse trading tools, achieving state-of-the-art results across multiple financial benchmarks.

Paper GitHub

KDD 2026

AlphaForgeBench: Benchmarking LLMs as Quantitative Researchers

Rather than asking LLMs to emit trading actions directly — which suffers from extreme run-to-run variance and irrational reversals — AlphaForgeBench repositions LLMs as quantitative researchers that generate executable alpha factors and strategy code, evaluated via standardized backtesting across 7 assets and 6 frontier models.

A 3×3 level-grade taxonomy of 903 queries (633 real-world + 270 augmented) reveals three persistent model archetypes and systematic difficulty scaling — providing a stable, reproducible foundation for LLM financial capability evaluation.

Paper GitHub Project Page

AI4Finance · Prediction Markets

PolyMonitor: Prediction Market Intelligence Workspace

An open-source live intelligence workspace for Polymarket — consolidating market prices, on-chain flow, oracle activity, order-book depth, and macro context into a unified dashboard. Paired with the Polymarket Agent, a 10-node multi-agent forecasting pipeline combining deterministic evidence construction, LLM specialist agents, adversarial critique, and calibration.

The platform serves as the operational infrastructure for our Unlocking the Forecasting Economy dataset suite, covering the full prediction market lifecycle from listing through oracle resolution and settlement.

Paper GitHub Project Page

KDD 2026

FinWorld: End-to-End Financial AI Platform

An all-in-one open-source platform for financial AI research and deployment, integrating data pipelines, model training, backtesting, and live deployment over 800M+ multimodal data samples from 1995–2025. Lowers the barrier for rigorous, reproducible AI4Finance research.

Paper GitHub Project Page

NeurIPS 2023

TradeMaster: A Holistic Quantitative Trading Platform Empowered by Reinforcement Learning

A holistic platform covering data processing, environment simulation, RL agent training, and performance evaluation across multiple financial markets and trading tasks.

⭐ 2.7k GitHub stars

Paper GitHub

AAAI 2024

EarnHFT: Efficient Hierarchical Reinforcement Learning for High-Frequency Trading

Decomposes the HFT problem into macro-level strategy selection and micro-level order execution. The hierarchical structure yields significantly improved sample efficiency and live-trading profitability.

Paper

WWW 2024

EarnMore: Portfolio Management in Customizable Stock Pools

A maskable stock representation framework that enables RL agents to handle arbitrary stock universes with a single trained model — eliminating the need to retrain per pool.

Paper GitHub

TWOSOME: True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning

Aligns LLMs with interactive environments through reinforcement learning, enabling agents to acquire genuine knowledge through embodied practice rather than passive pretraining.

Paper

Research

Research Themes

Featured Projects

Autogenesis: A Self-Evolving Agent Protocol

Cradle: Empowering Foundation Agents towards General Computer Control

AgentOrchestra: Hierarchical Multi-Agent Orchestration with the TEA Protocol

FinAgent: A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist

AlphaForgeBench: Benchmarking LLMs as Quantitative Researchers

PolyMonitor: Prediction Market Intelligence Workspace

FinWorld: End-to-End Financial AI Platform

TradeMaster: A Holistic Quantitative Trading Platform Empowered by Reinforcement Learning

EarnHFT: Efficient Hierarchical Reinforcement Learning for High-Frequency Trading

EarnMore: Portfolio Management in Customizable Stock Pools

TWOSOME: True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning

Templates:

Error