Skip to content

Schemas

OCF follows a strict schema-first design. Schemas define the semantic contracts between raw inputs, canonical data, feature blocks, Greeks computation, and pipeline outputs.


Schema Layers in OCF

OCF defines schemas at three logical layers:

  1. Raw input schemas (vendor-style data)
  2. Canonical schemas (normalized internal representations)
  3. Validation rules (hard constraints and safety checks)

Each layer serves a distinct role in the pipeline.


Raw Input Schemas

Raw schemas define the minimum required column names that must be present before normalization.


Underlying OHLCV Data

Description:
Daily underlying price and volume data.

Required columns:

Column Meaning
Date Trading date
Open Opening price
High High price
Low Low price
Close Closing price
Volume Trading volume

Option Chain Snapshot (Metadata)

Description:
Static metadata describing option contracts at a snapshot date.

Required columns:

Column Meaning
Call Ticker Identifier for call contract
Put Ticker Identifier for put contract
Expiration Option expiration date
Strike Strike price
Series Option series identifier
Exercise Type American / European
Contract Size Units per contract
Periodicity Contract periodicity

Implied Volatility by Moneyness (Smile)

These inputs describe ATM-relative implied volatility smiles.

30-Day Tenor

Required columns:

Column Meaning
Date Trading date
IVOL_30_90 30-day IV at 90% moneyness
IVOL_30_95 30-day IV at 95% moneyness
IVOL_30_100 30-day IV at ATM (100%)
IVOL_30_105 30-day IV at 105% moneyness
IVOL_30_110 30-day IV at 110% moneyness

60-Day Tenor

Required columns:

Column Meaning
Date Trading date
IVOL_60_90 60-day IV at 90% moneyness
IVOL_60_95 60-day IV at 95% moneyness
IVOL_60_100 60-day IV at ATM (100%)
IVOL_60_105 60-day IV at 105% moneyness
IVOL_60_110 60-day IV at 110% moneyness

Naming convention:
IVOL_<TENOR_DAYS>_<MONEYNESS_PERCENT>

Example:
IVOL_30_9030-day implied volatility at 90% of spot


Delta-Based IV Surface (Optional)

Description:
Single-date implied volatility surface indexed by delta.

Required columns:

Column Meaning
Expiration Option expiration
Delta Level Absolute delta (0–100)
Strike Strike price
Implied Volatility IV at given delta
Underlying Price Spot price
Dividend Dividend yield

Open Interest & Historical ATM IV

Description:
Liquidity, positioning, and historical ATM volatility inputs.

Required columns:

Column Meaning
Date Trading date
Close Underlying close
Bid Best bid
Ask Best ask
Total Call Open Interest Aggregate call OI
Total Put Open Interest Aggregate put OI
Volume Trading volume
Historical Call Implied Volatility ATM call IV
Historical Put Implied Volatility ATM put IV

Rates & Volatility Index

Risk-Free Rates

Required columns:

Column Meaning
Date Trading date
Risk Free SOFR SOFR 90-day rate (percent)
Risk Free USGG 3-month Treasury yield (percent)

Rates are converted internally to decimal form.


Volatility Index

Required columns:

Column Meaning
Date Trading date
Close VIX index level

Canonical Schemas

Canonical schemas define the internal normalized representations used by all feature blocks and Greeks.

These schemas are semantic, not vendor-specific.


Canonical Underlying Daily

Granularity: one row per (symbol, date)

Required Columns

Column
symbol
date
px_open
px_high
px_low
px_close
volume

Optional Columns

Column
bid
ask
mid
call_open_int_tot
put_open_int_tot
vix
rate_sofr_90d
rate_usgg3m

Derived Columns

ATM IV - iv_atm_hist

30-Day IV Smile - iv_30_m90 - iv_30_m95 - iv_30_m100 - iv_30_m105 - iv_30_m110

60-Day IV Smile - iv_60_m90 - iv_60_m95 - iv_60_m100 - iv_60_m105 - iv_60_m110

Derived columns may be missing if corresponding inputs are unavailable.


Canonical Option Chain Snapshot

Granularity: one row per option contract

Required Columns

Column
symbol
date
expiration
strike
call_ticker
put_ticker
exercise_type
contract_size
periodicity

Derived Columns

Column Meaning
time_to_expiry Time to expiration (years)
moneyness Spot-relative moneyness
log_moneyness Log-moneyness

Canonical IV Surface Snapshot

Required Columns

Column
symbol
date
expiration
delta
strike
iv

Optional Columns

Column
spot
dividend

Validation Guarantees

OCF enforces:

  • Required columns → hard error
  • Missing derived columns → warning
  • Unexpected columns → warning
  • Domain violations → error

Examples:

  • Prices must be positive
  • IV must be strictly positive
  • Rates must be decimals (not percentages)
  • Delta must lie in (0, 100)
  • Dates must not be null

Schema Stability

OCF guarantees:

  • Canonical schemas are stable within a major version
  • Breaking schema changes only occur in major releases
  • Feature blocks gracefully handle missing derived data
  • Pipelines fail fast on schema violations

As schemas are the foundation of reproducibility and correctness in OCF.