Cradle: Empowering Foundation Agents towards General Computer Control

We introduce Cradle, a framework that enables foundation agents to operate any computer software using only screenshots as input and keyboard/mouse actions as output — no task-specific APIs required.

We are excited to introduce Cradle, a framework for General Computer Control (GCC) that enables foundation agents to operate arbitrary software using only pixels and natural language — exactly how humans interact with computers.

Motivation

Most existing agent frameworks are brittle: they depend on task-specific APIs, hand-crafted interfaces, or structured environment observations. This severely limits their applicability to the real, messy world of commercial software.

Cradle’s core insight: the screen is the universal interface. Every piece of software exposes its state through pixels, and every user action reduces to keyboard and mouse inputs. By treating screenshots as observations and keyboard/mouse commands as actions, a single framework can operate any software without per-application engineering.

Framework

Cradle is built around six tightly integrated modules:

  • Information Gathering — processes raw screenshots into structured observations
  • Self-Reflection — evaluates the outcome of past actions and identifies errors
  • Task Inference — infers the current sub-goal from context and memory
  • Skill Curation — builds and refines a library of reusable skills from experience
  • Action Planning — generates executable action sequences toward the current goal
  • Memory — maintains episodic and semantic memory across long interaction horizons

The skill curation module is key to self-improvement: as the agent operates, it distills successful interaction patterns into reusable skills, progressively expanding its capabilities on new tasks without retraining.

Results

Cradle was evaluated across both commercial games and real productivity software:

DomainApplicationHighlight
AAA GameRed Dead Redemption 2First agent to follow storylines and complete 40-minute missions
SimulationCities: SkylinesCity planning and management tasks
Life SimStardew ValleyMulti-day farming and harvest sequences
TradingDealer’s Life 293.6% transaction completion rate
BrowserChromeWeb navigation and form filling
EmailOutlookEmail composition and management
VideoCapCutVideo editing workflows

On the OSWorld benchmark, Cradle achieves 7.81% success rate without relying on any internal APIs — demonstrating genuine generality.

⭐ 2.5k GitHub stars

Demo Videos

RDR2 Story Mode

Red Dead Redemption 2 — Story Mode

RDR2 Open-ended

Red Dead Redemption 2 — Open-ended

Cities: Skylines

Cities: Skylines

Stardew Valley

Stardew Valley

Dealer's Life 2

Dealer's Life 2

Software Demos

Software Applications Demo


© 2026. Wentao Zhang. All rights reserved.

Powered by Hydejack v9.2.1