Financial AI Tasks

Comprehensive overview of the four core financial AI tasks supported by FinWorld

Task Overview

FinWorld supports four core financial AI tasks that cover the entire spectrum of financial AI applications. Each task is designed with specific mathematical formulations and can be approached using multiple AI paradigms including traditional ML, deep learning, reinforcement learning, and large language models.

Notation

Throughout this documentation, we use the following notation:

Single Asset: $\mathbf{x}_{1:T} = \{x_1, ..., x_T\} \in \mathbb{R}^{T \times D}$ - Historical endogenous time series
Multi-Asset: $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$ - Historical price sequences for $N$ assets
Exogenous Variables: $\mathbf{z}_{1:T_{\mathrm{ex}}} = \{\mathbf{z}_1, ..., \mathbf{z}_{T_{\mathrm{ex}}}\} \in \mathbb{R}^{T_{\mathrm{ex}} \times C}$ - Technical indicators, financial factors, news sentiment

Time Series Forecasting

Financial time series forecasting differs from traditional forecasting in two key ways: it predicts returns rather than prices (due to non-stationarity), and it typically involves multiple assets simultaneously to reflect real-world portfolio management requirements.

Mathematical Formulation

Given: Historical prices of $N$ stocks $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$ and exogenous variables $\mathbf{z}_{1:T_{\mathrm{ex}}}$

Goal: Predict future $S$-day relative returns for all assets

$$\hat{\mathbf{R}}_{T+1:T+S} = \mathcal{F}_\theta(\mathbf{x}_{1:T}, \mathbf{z}_{1:T_{\mathrm{ex}}})$$

Where:

$\hat{\mathbf{R}}_{T+1:T+S} \in \mathbb{R}^{N \times S}$ are predicted relative returns
$\mathbf{R}_t = \frac{\mathbf{x}_t}{\mathbf{x}_T} - 1$ (relative return calculation)
$\mathcal{F}_\theta$ is the forecasting model parameterized by $\theta$

🔮 Multi-Asset Time Series Forecasting Workflow

📊

Historical Price Data

Input: $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$

📈 AAPL 30-day history

💻 MSFT 30-day history

🔍 GOOGL 30-day history

⚡ TSLA 30-day history

🏦 JPM 30-day history

⚙️

Feature Engineering

Processing + Exogenous Variables

🔧 Price Features OHLCV + Returns

📊 Technical Indicators 15 indicators

🌍 Exogenous Variables $\mathbf{z}_{1:T_{\mathrm{ex}}}$

📈 Time Window T=30 days

🤖

Multi-Asset Forecasting Model

Model: $\hat{\mathbf{R}}_{T+1:T+S}$

S=5

Forecast Horizon (days)

Model $\mathcal{F}_\theta$

Architecture LSTM

Output $\mathbb{R}^{N \times S}$

🔮

Predicted Relative Returns

Output: $\mathbf{R}_{t} = \frac{\mathbf{x}_t}{\mathbf{x}_T} - 1$

📈 AAPL +2.3%

💻 MSFT +1.8%

🔍 GOOGL +1.5%

⚡ TSLA +3.2%

🏦 JPM +1.9%

📈

Portfolio Performance

Task

Expected Return +2.2%

Sharpe Ratio 1.8

Volatility 12-15%

Confidence 78%

Key Features

Multi-Asset Prediction - Simultaneous forecasting for multiple assets
Return-Based - Predicts returns rather than absolute prices
Exogenous Integration - Incorporates technical indicators and external factors
Flexible Horizons - Supports various prediction horizons (S days)

Algorithmic Trading

Algorithmic trading involves the design, simulation, and evaluation of systematic trading strategies with emphasis on real-time decision making and risk management. FinWorld supports both ML/DL-based and RL-based approaches.

ML/DL-based Approach

Goal: Predict future price returns or movement directions

$$\hat{y}_{T+1:T+S} = \mathcal{F}_\theta(\mathbf{x}_{1:T}, \mathbf{z}_{1:T_{\mathrm{ex}}})$$

Trading Actions:

Buy if $\hat{y}_t > \tau$
Sell if $\hat{y}_t < -\tau$
Hold otherwise

Where:

$\hat{y}_t$ represents predicted return, probability, or trading signals
$\tau$ is a threshold parameter
Actions are determined by pre-defined decision rules

RL-based Approach

Modeled as Markov Decision Process (MDP):

At each time step $t$:

Observe state $s_t$ (constructed from $\mathbf{x}_{1:t}$ and $\mathbf{z}_{1:t}$)
Choose action $a_t \in \mathcal{A}$
Earn reward $r_t$
Transition to next state $s_{t+1}$

Objective: Learn policy $\pi_\theta(a_t \mid s_t)$ that maximizes:

$$\max_\theta~ \mathbb{E}_{\pi_\theta} \left[\sum_{t=T+1}^{T+S} r_t\right]$$

🤖 Algorithmic Trading Workflow

📊 ML/DL-based Approach

📊

Market Data

Input: $\mathbf{x}_{1:T} \in \mathbb{R}^{T \times D}$

📈 AAPL $150.25

📊 Volume 2.1M

📰 News iPhone sales up

⏰ Time 09:30:15 EST

⚙️

Feature Engineering

Processing + Exogenous Variables

📈 Price Features OHLCV + Returns

📊 Technical RSI, MACD, MA

📰 Sentiment News Analysis

⏰ Time T=30 days

🤖

ML/DL Model

Predict: $\hat{y}_{T+1:T+S}$

LSTM

Architecture

Input $\mathbf{x}_{1:T}$

Output $\hat{y}_t$

Confidence 78%

📈

Trading Signal

Decision: Buy/Sell/Hold

📈 Signal BUY

🎯 Target $155.50

🛡️ Stop $148.00

📊 Confidence 85%

💼

Trading

Action

Action BUY 100 AAPL

Price $150.30

Cost $15,030

Status Filled

🧠 RL-based Approach

👁️

State Observation

State: $s_t$

📊 Market Price, Volume

💼 Portfolio Positions, Cash

📰 News Sentiment

⏰ Time Market Hours

🎯

Action Space

Actions: $\mathcal{A}$

📈 Buy Long Position

📉 Sell Short Position

⏸️ Hold No Action

💰 Size Position Size

🤖

RL Agent

Policy: $\pi_\theta(a_t \mid s_t)$

PPO

Algorithm

State $s_t$

Action $a_t$

Reward $r_t$

💰

Reward Function

Reward: $r_t$

📈 Returns +2.3%

📊 Sharpe 1.8

📉 Drawdown -5.2%

🎯 Risk 15%

🔄

Policy Learning

Optimization

Objective Max $\mathbb{E}[\sum r_t]$

Method Policy Gradient

Episodes 10,000

Convergence 85%

Key Features

Real-time Decision Making - Immediate action selection based on market conditions
Risk Management - Built-in risk controls and position limits
Multiple Paradigms - Support for both predictive and RL approaches
Transaction Costs - Realistic simulation of trading costs and slippage

Portfolio Management

Portfolio management focuses on the construction, optimization, and dynamic rebalancing of investment portfolios subject to real-world operational constraints. It supports various objective functions including return maximization, volatility minimization, and Sharpe ratio optimization.

ML/DL-based Approach

Goal: Predict future asset returns or risk estimates

$$\hat{\mathbf{y}}_{T+1:T+S} = \mathcal{F}_\theta(\mathbf{x}_{1:T}, \mathbf{z}_{1:T_{\mathrm{ex}}})$$

Portfolio Weights:

$\mathbf{w}_{T+1:T+S}$ where $\sum_{i=1}^N w_{t,i} = 1$ and $w_{t,i} \geq 0$
Determined by optimization procedures (mean-variance, risk parity, etc.)

Where:

$\hat{\mathbf{y}}_t$ may represent predicted returns, risk scores, or signals
Weights are optimized based on predictions

RL-based Approach

Sequential Decision Process:

At each time $t$:

Observe state $s_t$ (constructed from $\mathbf{x}_{1:t}$ and $\mathbf{z}_{1:t}$)
Select allocation weights $\mathbf{w}_t \in \Delta^{N+1}$
Receive reward $r_t$ (portfolio return or risk-adjusted reward)
Transition to next state $s_{t+1}$

Objective: Learn policy $\pi_\theta(\mathbf{w}_t \mid s_t)$ that maximizes:

$$\max_\theta~ \mathbb{E}_{\pi_\theta} \left[\sum_{t=T+1}^{T+S} U(\mathbf{w}_t, \mathbf{x}_t)\right]$$

Where:

$w_{t,0}$ represents cash position
$U(\cdot)$ represents cumulative portfolio return

💼 Portfolio Management Workflow

📊 ML/DL-based Approach

📊

Multi-Asset Data

Input: $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$

📈 AAPL $150.25

💻 MSFT $380.50

🔍 GOOGL $2,850

📊 VIX 18.5

⚙️

Feature Engineering

Processing + Exogenous Variables

📈 Returns Historical Returns

📊 Risk Volatility, VaR

🔗 Correlation Asset Correlations

⏰ Time T=30 days

🤖

ML/DL Model

Predict: $\hat{\mathbf{y}}_{T+1:T+S}$

LSTM

Architecture

Input $\mathbf{x}_{1:T}$

Output $\hat{\mathbf{y}}_t$

Assets N=4

⚖️

Portfolio Weights

Weights: $\mathbf{w}_{T+1:T+S}$

📈 AAPL 35%

💻 MSFT 30%

🔍 GOOGL 25%

💰 Cash 10%

💼

Portfolio Management

Action

Expected Return 8.5%

Risk 12.3%

Sharpe Ratio 1.85

Rebalance Weekly

🧠 RL-based Approach

👁️

State Observation

State: $s_t$

📊 Market Prices, Volumes

💼 Portfolio Current Weights

📰 News Market Sentiment

⏰ Time Market Hours

🎯

Action Space

Actions: $\mathcal{A}$

⚖️ Weights $\mathbf{w}_t \in \Delta^{N+1}$

💰 Cash $w_{t,0}$

📈 Assets $w_{t,1:N}$

🔄 Rebalance Frequency

🤖

RL Agent

Policy: $\pi_\theta(\mathbf{w}_t \mid s_t)$

PPO

Algorithm

State $s_t$

Action $\mathbf{w}_t$

Reward $r_t$

💰

Reward Function

Reward: $r_t$

📈 Returns +8.5%

📊 Sharpe 1.85

📉 Drawdown -5.2%

🎯 Risk 12.3%

🔄

Policy Learning

Optimization

Objective Max $\mathbb{E}[\sum U(\mathbf{w}_t, \mathbf{x}_t)]$

Method Policy Gradient

Episodes 10,000

Convergence 88%

Key Features

Multi-Asset Allocation - Simultaneous optimization across multiple assets
Risk Constraints - Position limits, transaction costs, and regulatory constraints
Dynamic Rebalancing - Adaptive portfolio adjustments over time
Objective Flexibility - Support for various risk-return objectives

LLM Applications

LLM applications in finance encompass two main categories: general language understanding tasks and sequential decision-making tasks. These applications leverage the power of large language models for financial reasoning, analysis, and decision-making.

Mathematical Formulation

Given: Multi-modal financial inputs

Unstructured text (news, reports)
Structured time series (OHLC prices)
Images (K-line charts)
Audio/Video (financial broadcasts)

Goal: Train and deploy large language models $\mathcal{M}_\phi$

$$\mathbf{y} = \mathcal{M}_\phi(\mathbf{D}_1, \mathbf{D}_2, ..., \mathbf{D}_K)$$

Where:

Each $\mathbf{D}_k$ represents an input modality
$\mathbf{y}$ is the task-specific output
$\mathcal{M}_\phi$ is the LLM parameterized by $\phi$

Application Categories

General Language Understanding - Financial text analysis, QA, reasoning
Sequential Decision-Making - RL training through environment interaction
Multi-Modal Processing - Integration of text, time series, and visual data
Tool Utilization - Advanced tool use for financial analysis

🤖 LLM Applications Workflow

📱 Application Types

📊

Multi-Modal
Data

Sources

📰 Text News, Reports

📈 Time Series OHLC Prices

📊 Images K-line Charts

🎵 Audio/Video Financial Broadcasts

🤖

LLM Processing

Model: $\mathcal{M}_\phi$

🧠 Architecture Transformer-based

📚 Training SFT + RL

🎯 Capability Reasoning

⚡ Output $\mathbf{y} = \mathcal{M}_\phi(\mathbf{D}_1, ..., \mathbf{D}_K)$

🎯

Different Tasks

Task Categories

💬 General QA Financial Questions

📈 Time Series Forecasting

💼 Trading BUY/SELL/HOLD

📊 Portfolio Asset Selection

GRPO Training

Group Relative Policy Optimization (GRPO) is used for efficient LLM training in financial contexts:

Stage 1: Financial reasoning abilities using financial reasoning datasets
Stage 2: Real/simulated market environment interaction for practical skills
Group-level Normalization: Stable learning signals for fine-tuning
Environment Interaction: Direct trading environment interaction for skill acquisition

🔄 Two-Stage GRPO Training

Stage I: Financial Reasoning Fine-tuning

📚

Financial Datasets

Reasoning Tasks

📊 QA Financial Questions

🧮 Math Quantitative Analysis

📈 Analysis Market Scenarios

🎯 Goal Reasoning Skills

🤖

RL
Training

Reasoning

📚 SFT Supervised Learning

🔄 RL Reinforcement Learning

🎯 Reward Reasoning Quality

📈 Result Financial Reasoning

🧠

Reasoning Capability

Foundation Model

💭 Understanding Financial Concepts

🧮 Calculation Quantitative Skills

📊 Analysis Market Interpretation

🎯 Limitation No Market Experience

Stage II: Market Environment Learning

📊

Market Environment

Real Market Data

📈 Prices Historical Data

📰 News Market Events

⚡ Volatility Market Dynamics

🎯 Goal Decision Making

🔄

RL Training

Environment Interaction

🎮 Environment Market Simulation

🎯 Actions Trading Decisions

💰 Rewards Financial Performance

📈 Learning Trial & Error

🚀

Deployment Ready

Production Model

💼 Trading Real Decisions

📊 Analysis Market Adaptation

🎯 Performance Financial Outcomes

🔄 Continuous Learning & Update

Task Integration

FinWorld's modular architecture enables seamless integration across all four task types, allowing researchers and practitioners to:

Cross-Task Learning

Shared Representations - Common feature extraction across tasks
Transfer Learning - Knowledge transfer between related tasks
Multi-Task Training - Joint optimization of multiple objectives

Unified Evaluation

Standardized Metrics - Consistent evaluation across all tasks
Comparative Analysis - Direct comparison of different approaches
Reproducible Results - Standardized protocols for fair comparison

Flexible Deployment

Pipeline Composition - Easy combination of different task components
Real-time Processing - Support for live market applications
Scalable Architecture - From research prototypes to production systems