Back to Documentation

Financial AI Tasks

Comprehensive overview of the four core financial AI tasks supported by FinWorld

Task Overview

FinWorld supports four core financial AI tasks that cover the entire spectrum of financial AI applications. Each task is designed with specific mathematical formulations and can be approached using multiple AI paradigms including traditional ML, deep learning, reinforcement learning, and large language models.

Notation

Throughout this documentation, we use the following notation:

  • Single Asset: $\mathbf{x}_{1:T} = \{x_1, ..., x_T\} \in \mathbb{R}^{T \times D}$ - Historical endogenous time series
  • Multi-Asset: $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$ - Historical price sequences for $N$ assets
  • Exogenous Variables: $\mathbf{z}_{1:T_{\mathrm{ex}}} = \{\mathbf{z}_1, ..., \mathbf{z}_{T_{\mathrm{ex}}}\} \in \mathbb{R}^{T_{\mathrm{ex}} \times C}$ - Technical indicators, financial factors, news sentiment

Time Series Forecasting

Financial time series forecasting differs from traditional forecasting in two key ways: it predicts returns rather than prices (due to non-stationarity), and it typically involves multiple assets simultaneously to reflect real-world portfolio management requirements.

Mathematical Formulation

Given: Historical prices of $N$ stocks $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$ and exogenous variables $\mathbf{z}_{1:T_{\mathrm{ex}}}$

Goal: Predict future $S$-day relative returns for all assets

$$\hat{\mathbf{R}}_{T+1:T+S} = \mathcal{F}_\theta(\mathbf{x}_{1:T}, \mathbf{z}_{1:T_{\mathrm{ex}}})$$

Where:

  • $\hat{\mathbf{R}}_{T+1:T+S} \in \mathbb{R}^{N \times S}$ are predicted relative returns
  • $\mathbf{R}_t = \frac{\mathbf{x}_t}{\mathbf{x}_T} - 1$ (relative return calculation)
  • $\mathcal{F}_\theta$ is the forecasting model parameterized by $\theta$

🔮 Multi-Asset Time Series Forecasting Workflow

1
📊
Historical Price Data
Input: $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$
📈 AAPL 30-day history
💻 MSFT 30-day history
🔍 GOOGL 30-day history
⚡ TSLA 30-day history
🏦 JPM 30-day history
2
⚙️
Feature Engineering
Processing + Exogenous Variables
🔧 Price Features OHLCV + Returns
📊 Technical Indicators 15 indicators
🌍 Exogenous Variables $\mathbf{z}_{1:T_{\mathrm{ex}}}$
📈 Time Window T=30 days
3
🤖
Multi-Asset Forecasting Model
Model: $\hat{\mathbf{R}}_{T+1:T+S}$
S=5
Forecast Horizon (days)
Model $\mathcal{F}_\theta$
Architecture LSTM
Output $\mathbb{R}^{N \times S}$
4
🔮
Predicted Relative Returns
Output: $\mathbf{R}_{t} = \frac{\mathbf{x}_t}{\mathbf{x}_T} - 1$
📈 AAPL +2.3%
💻 MSFT +1.8%
🔍 GOOGL +1.5%
⚡ TSLA +3.2%
🏦 JPM +1.9%
5
📈
Portfolio Performance
Task
Expected Return +2.2%
Sharpe Ratio 1.8
Volatility 12-15%
Confidence 78%

Key Features

  • Multi-Asset Prediction - Simultaneous forecasting for multiple assets
  • Return-Based - Predicts returns rather than absolute prices
  • Exogenous Integration - Incorporates technical indicators and external factors
  • Flexible Horizons - Supports various prediction horizons (S days)

Algorithmic Trading

Algorithmic trading involves the design, simulation, and evaluation of systematic trading strategies with emphasis on real-time decision making and risk management. FinWorld supports both ML/DL-based and RL-based approaches.

ML/DL-based Approach

Goal: Predict future price returns or movement directions

$$\hat{y}_{T+1:T+S} = \mathcal{F}_\theta(\mathbf{x}_{1:T}, \mathbf{z}_{1:T_{\mathrm{ex}}})$$

Trading Actions:

  • Buy if $\hat{y}_t > \tau$
  • Sell if $\hat{y}_t < -\tau$
  • Hold otherwise

Where:

  • $\hat{y}_t$ represents predicted return, probability, or trading signals
  • $\tau$ is a threshold parameter
  • Actions are determined by pre-defined decision rules

RL-based Approach

Modeled as Markov Decision Process (MDP):

At each time step $t$:

  • Observe state $s_t$ (constructed from $\mathbf{x}_{1:t}$ and $\mathbf{z}_{1:t}$)
  • Choose action $a_t \in \mathcal{A}$
  • Earn reward $r_t$
  • Transition to next state $s_{t+1}$

Objective: Learn policy $\pi_\theta(a_t \mid s_t)$ that maximizes:

$$\max_\theta~ \mathbb{E}_{\pi_\theta} \left[\sum_{t=T+1}^{T+S} r_t\right]$$

🤖 Algorithmic Trading Workflow

📊 ML/DL-based Approach

1
📊
Market Data
Input: $\mathbf{x}_{1:T} \in \mathbb{R}^{T \times D}$
📈 AAPL $150.25
📊 Volume 2.1M
📰 News iPhone sales up
⏰ Time 09:30:15 EST
2
⚙️
Feature Engineering
Processing + Exogenous Variables
📈 Price Features OHLCV + Returns
📊 Technical RSI, MACD, MA
📰 Sentiment News Analysis
⏰ Time T=30 days
3
🤖
ML/DL Model
Predict: $\hat{y}_{T+1:T+S}$
LSTM
Architecture
Input $\mathbf{x}_{1:T}$
Output $\hat{y}_t$
Confidence 78%
4
📈
Trading Signal
Decision: Buy/Sell/Hold
📈 Signal BUY
🎯 Target $155.50
🛡️ Stop $148.00
📊 Confidence 85%
5
💼
Trading
Action
Action BUY 100 AAPL
Price $150.30
Cost $15,030
Status Filled

🧠 RL-based Approach

1
👁️
State Observation
State: $s_t$
📊 Market Price, Volume
💼 Portfolio Positions, Cash
📰 News Sentiment
⏰ Time Market Hours
2
🎯
Action Space
Actions: $\mathcal{A}$
📈 Buy Long Position
📉 Sell Short Position
⏸️ Hold No Action
💰 Size Position Size
3
🤖
RL Agent
Policy: $\pi_\theta(a_t \mid s_t)$
PPO
Algorithm
State $s_t$
Action $a_t$
Reward $r_t$
4
💰
Reward Function
Reward: $r_t$
📈 Returns +2.3%
📊 Sharpe 1.8
📉 Drawdown -5.2%
🎯 Risk 15%
5
🔄
Policy Learning
Optimization
Objective Max $\mathbb{E}[\sum r_t]$
Method Policy Gradient
Episodes 10,000
Convergence 85%

Key Features

  • Real-time Decision Making - Immediate action selection based on market conditions
  • Risk Management - Built-in risk controls and position limits
  • Multiple Paradigms - Support for both predictive and RL approaches
  • Transaction Costs - Realistic simulation of trading costs and slippage

Portfolio Management

Portfolio management focuses on the construction, optimization, and dynamic rebalancing of investment portfolios subject to real-world operational constraints. It supports various objective functions including return maximization, volatility minimization, and Sharpe ratio optimization.

ML/DL-based Approach

Goal: Predict future asset returns or risk estimates

$$\hat{\mathbf{y}}_{T+1:T+S} = \mathcal{F}_\theta(\mathbf{x}_{1:T}, \mathbf{z}_{1:T_{\mathrm{ex}}})$$

Portfolio Weights:

  • $\mathbf{w}_{T+1:T+S}$ where $\sum_{i=1}^N w_{t,i} = 1$ and $w_{t,i} \geq 0$
  • Determined by optimization procedures (mean-variance, risk parity, etc.)

Where:

  • $\hat{\mathbf{y}}_t$ may represent predicted returns, risk scores, or signals
  • Weights are optimized based on predictions

RL-based Approach

Sequential Decision Process:

At each time $t$:

  • Observe state $s_t$ (constructed from $\mathbf{x}_{1:t}$ and $\mathbf{z}_{1:t}$)
  • Select allocation weights $\mathbf{w}_t \in \Delta^{N+1}$
  • Receive reward $r_t$ (portfolio return or risk-adjusted reward)
  • Transition to next state $s_{t+1}$

Objective: Learn policy $\pi_\theta(\mathbf{w}_t \mid s_t)$ that maximizes:

$$\max_\theta~ \mathbb{E}_{\pi_\theta} \left[\sum_{t=T+1}^{T+S} U(\mathbf{w}_t, \mathbf{x}_t)\right]$$

Where:

  • $w_{t,0}$ represents cash position
  • $U(\cdot)$ represents cumulative portfolio return

💼 Portfolio Management Workflow

📊 ML/DL-based Approach

1
📊
Multi-Asset Data
Input: $\mathbf{x}_{1:T} \in \mathbb{R}^{N \times T \times D}$
📈 AAPL $150.25
💻 MSFT $380.50
🔍 GOOGL $2,850
📊 VIX 18.5
2
⚙️
Feature Engineering
Processing + Exogenous Variables
📈 Returns Historical Returns
📊 Risk Volatility, VaR
🔗 Correlation Asset Correlations
⏰ Time T=30 days
3
🤖
ML/DL Model
Predict: $\hat{\mathbf{y}}_{T+1:T+S}$
LSTM
Architecture
Input $\mathbf{x}_{1:T}$
Output $\hat{\mathbf{y}}_t$
Assets N=4
4
⚖️
Portfolio Weights
Weights: $\mathbf{w}_{T+1:T+S}$
📈 AAPL 35%
💻 MSFT 30%
🔍 GOOGL 25%
💰 Cash 10%
5
💼
Portfolio Management
Action
Expected Return 8.5%
Risk 12.3%
Sharpe Ratio 1.85
Rebalance Weekly

🧠 RL-based Approach

1
👁️
State Observation
State: $s_t$
📊 Market Prices, Volumes
💼 Portfolio Current Weights
📰 News Market Sentiment
⏰ Time Market Hours
2
🎯
Action Space
Actions: $\mathcal{A}$
⚖️ Weights $\mathbf{w}_t \in \Delta^{N+1}$
💰 Cash $w_{t,0}$
📈 Assets $w_{t,1:N}$
🔄 Rebalance Frequency
3
🤖
RL Agent
Policy: $\pi_\theta(\mathbf{w}_t \mid s_t)$
PPO
Algorithm
State $s_t$
Action $\mathbf{w}_t$
Reward $r_t$
4
💰
Reward Function
Reward: $r_t$
📈 Returns +8.5%
📊 Sharpe 1.85
📉 Drawdown -5.2%
🎯 Risk 12.3%
5
🔄
Policy Learning
Optimization
Objective Max $\mathbb{E}[\sum U(\mathbf{w}_t, \mathbf{x}_t)]$
Method Policy Gradient
Episodes 10,000
Convergence 88%

Key Features

  • Multi-Asset Allocation - Simultaneous optimization across multiple assets
  • Risk Constraints - Position limits, transaction costs, and regulatory constraints
  • Dynamic Rebalancing - Adaptive portfolio adjustments over time
  • Objective Flexibility - Support for various risk-return objectives

LLM Applications

LLM applications in finance encompass two main categories: general language understanding tasks and sequential decision-making tasks. These applications leverage the power of large language models for financial reasoning, analysis, and decision-making.

Mathematical Formulation

Given: Multi-modal financial inputs

  • Unstructured text (news, reports)
  • Structured time series (OHLC prices)
  • Images (K-line charts)
  • Audio/Video (financial broadcasts)

Goal: Train and deploy large language models $\mathcal{M}_\phi$

$$\mathbf{y} = \mathcal{M}_\phi(\mathbf{D}_1, \mathbf{D}_2, ..., \mathbf{D}_K)$$

Where:

  • Each $\mathbf{D}_k$ represents an input modality
  • $\mathbf{y}$ is the task-specific output
  • $\mathcal{M}_\phi$ is the LLM parameterized by $\phi$

Application Categories

  • General Language Understanding - Financial text analysis, QA, reasoning
  • Sequential Decision-Making - RL training through environment interaction
  • Multi-Modal Processing - Integration of text, time series, and visual data
  • Tool Utilization - Advanced tool use for financial analysis

🤖 LLM Applications Workflow

📱 Application Types

1
📊
Multi-Modal
Data
Sources
📰 Text News, Reports
📈 Time Series OHLC Prices
📊 Images K-line Charts
🎵 Audio/Video Financial Broadcasts
2
🤖
LLM Processing
Model: $\mathcal{M}_\phi$
🧠 Architecture Transformer-based
📚 Training SFT + RL
🎯 Capability Reasoning
⚡ Output $\mathbf{y} = \mathcal{M}_\phi(\mathbf{D}_1, ..., \mathbf{D}_K)$
3
🎯
Different Tasks
Task Categories
💬 General QA Financial Questions
📈 Time Series Forecasting
💼 Trading BUY/SELL/HOLD
📊 Portfolio Asset Selection

GRPO Training

Group Relative Policy Optimization (GRPO) is used for efficient LLM training in financial contexts:

  • Stage 1: Financial reasoning abilities using financial reasoning datasets
  • Stage 2: Real/simulated market environment interaction for practical skills
  • Group-level Normalization: Stable learning signals for fine-tuning
  • Environment Interaction: Direct trading environment interaction for skill acquisition

🔄 Two-Stage GRPO Training

Stage I: Financial Reasoning Fine-tuning
1
📚
Financial Datasets
Reasoning Tasks
📊 QA Financial Questions
🧮 Math Quantitative Analysis
📈 Analysis Market Scenarios
🎯 Goal Reasoning Skills
2
🤖
RL
Training
Reasoning
📚 SFT Supervised Learning
🔄 RL Reinforcement Learning
🎯 Reward Reasoning Quality
📈 Result Financial Reasoning
3
🧠
Reasoning Capability
Foundation Model
💭 Understanding Financial Concepts
🧮 Calculation Quantitative Skills
📊 Analysis Market Interpretation
🎯 Limitation No Market Experience
Stage II: Market Environment Learning
4
📊
Market Environment
Real Market Data
📈 Prices Historical Data
📰 News Market Events
⚡ Volatility Market Dynamics
🎯 Goal Decision Making
5
🔄
RL Training
Environment Interaction
🎮 Environment Market Simulation
🎯 Actions Trading Decisions
💰 Rewards Financial Performance
📈 Learning Trial & Error
6
🚀
Deployment Ready
Production Model
💼 Trading Real Decisions
📊 Analysis Market Adaptation
🎯 Performance Financial Outcomes
🔄 Continuous Learning & Update

Task Integration

FinWorld's modular architecture enables seamless integration across all four task types, allowing researchers and practitioners to:

Cross-Task Learning

  • Shared Representations - Common feature extraction across tasks
  • Transfer Learning - Knowledge transfer between related tasks
  • Multi-Task Training - Joint optimization of multiple objectives

Unified Evaluation

  • Standardized Metrics - Consistent evaluation across all tasks
  • Comparative Analysis - Direct comparison of different approaches
  • Reproducible Results - Standardized protocols for fair comparison

Flexible Deployment

  • Pipeline Composition - Easy combination of different task components
  • Real-time Processing - Support for live market applications
  • Scalable Architecture - From research prototypes to production systems