Financial AI Tasks
Comprehensive overview of the four core financial AI tasks supported by FinWorld
Task Overview
FinWorld supports four core financial AI tasks that cover the entire spectrum of financial AI applications. Each task is designed with specific mathematical formulations and can be approached using multiple AI paradigms including traditional ML, deep learning, reinforcement learning, and large language models.
Notation
Throughout this documentation, we use the following notation:
- Single Asset: $\mathbf{x}_{1:T} = \{x_1, ..., x_T\} \in \mathbb{R}^{T \times D}$ - Historical endogenous time series
- Multi-Asset: $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$ - Historical price sequences for $N$ assets
- Exogenous Variables: $\mathbf{z}_{1:T_{\mathrm{ex}}} = \{\mathbf{z}_1, ..., \mathbf{z}_{T_{\mathrm{ex}}}\} \in \mathbb{R}^{T_{\mathrm{ex}} \times C}$ - Technical indicators, financial factors, news sentiment
Time Series Forecasting
Financial time series forecasting differs from traditional forecasting in two key ways: it predicts returns rather than prices (due to non-stationarity), and it typically involves multiple assets simultaneously to reflect real-world portfolio management requirements.
Mathematical Formulation
Given: Historical prices of $N$ stocks $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$ and exogenous variables $\mathbf{z}_{1:T_{\mathrm{ex}}}$
Goal: Predict future $S$-day relative returns for all assets
Where:
- $\hat{\mathbf{R}}_{T+1:T+S} \in \mathbb{R}^{N \times S}$ are predicted relative returns
- $\mathbf{R}_t = \frac{\mathbf{x}_t}{\mathbf{x}_T} - 1$ (relative return calculation)
- $\mathcal{F}_\theta$ is the forecasting model parameterized by $\theta$
🔮 Multi-Asset Time Series Forecasting Workflow
Key Features
- Multi-Asset Prediction - Simultaneous forecasting for multiple assets
- Return-Based - Predicts returns rather than absolute prices
- Exogenous Integration - Incorporates technical indicators and external factors
- Flexible Horizons - Supports various prediction horizons (S days)
Algorithmic Trading
Algorithmic trading involves the design, simulation, and evaluation of systematic trading strategies with emphasis on real-time decision making and risk management. FinWorld supports both ML/DL-based and RL-based approaches.
ML/DL-based Approach
Goal: Predict future price returns or movement directions
Trading Actions:
- Buy if $\hat{y}_t > \tau$
- Sell if $\hat{y}_t < -\tau$
- Hold otherwise
Where:
- $\hat{y}_t$ represents predicted return, probability, or trading signals
- $\tau$ is a threshold parameter
- Actions are determined by pre-defined decision rules
RL-based Approach
Modeled as Markov Decision Process (MDP):
At each time step $t$:
- Observe state $s_t$ (constructed from $\mathbf{x}_{1:t}$ and $\mathbf{z}_{1:t}$)
- Choose action $a_t \in \mathcal{A}$
- Earn reward $r_t$
- Transition to next state $s_{t+1}$
Objective: Learn policy $\pi_\theta(a_t \mid s_t)$ that maximizes:
🤖 Algorithmic Trading Workflow
📊 ML/DL-based Approach
🧠 RL-based Approach
Key Features
- Real-time Decision Making - Immediate action selection based on market conditions
- Risk Management - Built-in risk controls and position limits
- Multiple Paradigms - Support for both predictive and RL approaches
- Transaction Costs - Realistic simulation of trading costs and slippage
Portfolio Management
Portfolio management focuses on the construction, optimization, and dynamic rebalancing of investment portfolios subject to real-world operational constraints. It supports various objective functions including return maximization, volatility minimization, and Sharpe ratio optimization.
ML/DL-based Approach
Goal: Predict future asset returns or risk estimates
Portfolio Weights:
- $\mathbf{w}_{T+1:T+S}$ where $\sum_{i=1}^N w_{t,i} = 1$ and $w_{t,i} \geq 0$
- Determined by optimization procedures (mean-variance, risk parity, etc.)
Where:
- $\hat{\mathbf{y}}_t$ may represent predicted returns, risk scores, or signals
- Weights are optimized based on predictions
RL-based Approach
Sequential Decision Process:
At each time $t$:
- Observe state $s_t$ (constructed from $\mathbf{x}_{1:t}$ and $\mathbf{z}_{1:t}$)
- Select allocation weights $\mathbf{w}_t \in \Delta^{N+1}$
- Receive reward $r_t$ (portfolio return or risk-adjusted reward)
- Transition to next state $s_{t+1}$
Objective: Learn policy $\pi_\theta(\mathbf{w}_t \mid s_t)$ that maximizes:
Where:
- $w_{t,0}$ represents cash position
- $U(\cdot)$ represents cumulative portfolio return
💼 Portfolio Management Workflow
📊 ML/DL-based Approach
🧠 RL-based Approach
Key Features
- Multi-Asset Allocation - Simultaneous optimization across multiple assets
- Risk Constraints - Position limits, transaction costs, and regulatory constraints
- Dynamic Rebalancing - Adaptive portfolio adjustments over time
- Objective Flexibility - Support for various risk-return objectives
LLM Applications
LLM applications in finance encompass two main categories: general language understanding tasks and sequential decision-making tasks. These applications leverage the power of large language models for financial reasoning, analysis, and decision-making.
Mathematical Formulation
Given: Multi-modal financial inputs
- Unstructured text (news, reports)
- Structured time series (OHLC prices)
- Images (K-line charts)
- Audio/Video (financial broadcasts)
Goal: Train and deploy large language models $\mathcal{M}_\phi$
Where:
- Each $\mathbf{D}_k$ represents an input modality
- $\mathbf{y}$ is the task-specific output
- $\mathcal{M}_\phi$ is the LLM parameterized by $\phi$
Application Categories
- General Language Understanding - Financial text analysis, QA, reasoning
- Sequential Decision-Making - RL training through environment interaction
- Multi-Modal Processing - Integration of text, time series, and visual data
- Tool Utilization - Advanced tool use for financial analysis
🤖 LLM Applications Workflow
📱 Application Types
Data
GRPO Training
Group Relative Policy Optimization (GRPO) is used for efficient LLM training in financial contexts:
- Stage 1: Financial reasoning abilities using financial reasoning datasets
- Stage 2: Real/simulated market environment interaction for practical skills
- Group-level Normalization: Stable learning signals for fine-tuning
- Environment Interaction: Direct trading environment interaction for skill acquisition
🔄 Two-Stage GRPO Training
Stage I: Financial Reasoning Fine-tuning
Training
Stage II: Market Environment Learning
Task Integration
FinWorld's modular architecture enables seamless integration across all four task types, allowing researchers and practitioners to:
Cross-Task Learning
- Shared Representations - Common feature extraction across tasks
- Transfer Learning - Knowledge transfer between related tasks
- Multi-Task Training - Joint optimization of multiple objectives
Unified Evaluation
- Standardized Metrics - Consistent evaluation across all tasks
- Comparative Analysis - Direct comparison of different approaches
- Reproducible Results - Standardized protocols for fair comparison
Flexible Deployment
- Pipeline Composition - Easy combination of different task components
- Real-time Processing - Support for live market applications
- Scalable Architecture - From research prototypes to production systems