Schemas¶
OCF follows a strict schema-first design. Schemas define the semantic contracts between raw inputs, canonical data, feature blocks, Greeks computation, and pipeline outputs.
Schema Layers in OCF¶
OCF defines schemas at three logical layers:
- Raw input schemas (vendor-style data)
- Canonical schemas (normalized internal representations)
- Validation rules (hard constraints and safety checks)
Each layer serves a distinct role in the pipeline.
Raw Input Schemas¶
Raw schemas define the minimum required column names that must be present before normalization.
Underlying OHLCV Data¶
Description:
Daily underlying price and volume data.
Required columns:
| Column | Meaning |
|---|---|
Date |
Trading date |
Open |
Opening price |
High |
High price |
Low |
Low price |
Close |
Closing price |
Volume |
Trading volume |
Option Chain Snapshot (Metadata)¶
Description:
Static metadata describing option contracts at a snapshot date.
Required columns:
| Column | Meaning |
|---|---|
Call Ticker |
Identifier for call contract |
Put Ticker |
Identifier for put contract |
Expiration |
Option expiration date |
Strike |
Strike price |
Series |
Option series identifier |
Exercise Type |
American / European |
Contract Size |
Units per contract |
Periodicity |
Contract periodicity |
Implied Volatility by Moneyness (Smile)¶
These inputs describe ATM-relative implied volatility smiles.
30-Day Tenor¶
Required columns:
| Column | Meaning |
|---|---|
Date |
Trading date |
IVOL_30_90 |
30-day IV at 90% moneyness |
IVOL_30_95 |
30-day IV at 95% moneyness |
IVOL_30_100 |
30-day IV at ATM (100%) |
IVOL_30_105 |
30-day IV at 105% moneyness |
IVOL_30_110 |
30-day IV at 110% moneyness |
60-Day Tenor¶
Required columns:
| Column | Meaning |
|---|---|
Date |
Trading date |
IVOL_60_90 |
60-day IV at 90% moneyness |
IVOL_60_95 |
60-day IV at 95% moneyness |
IVOL_60_100 |
60-day IV at ATM (100%) |
IVOL_60_105 |
60-day IV at 105% moneyness |
IVOL_60_110 |
60-day IV at 110% moneyness |
Naming convention:
IVOL_<TENOR_DAYS>_<MONEYNESS_PERCENT>
Example:
IVOL_30_90 → 30-day implied volatility at 90% of spot
Delta-Based IV Surface (Optional)¶
Description:
Single-date implied volatility surface indexed by delta.
Required columns:
| Column | Meaning |
|---|---|
Expiration |
Option expiration |
Delta Level |
Absolute delta (0–100) |
Strike |
Strike price |
Implied Volatility |
IV at given delta |
Underlying Price |
Spot price |
Dividend |
Dividend yield |
Open Interest & Historical ATM IV¶
Description:
Liquidity, positioning, and historical ATM volatility inputs.
Required columns:
| Column | Meaning |
|---|---|
Date |
Trading date |
Close |
Underlying close |
Bid |
Best bid |
Ask |
Best ask |
Total Call Open Interest |
Aggregate call OI |
Total Put Open Interest |
Aggregate put OI |
Volume |
Trading volume |
Historical Call Implied Volatility |
ATM call IV |
Historical Put Implied Volatility |
ATM put IV |
Rates & Volatility Index¶
Risk-Free Rates¶
Required columns:
| Column | Meaning |
|---|---|
Date |
Trading date |
Risk Free SOFR |
SOFR 90-day rate (percent) |
Risk Free USGG |
3-month Treasury yield (percent) |
Rates are converted internally to decimal form.
Volatility Index¶
Required columns:
| Column | Meaning |
|---|---|
Date |
Trading date |
Close |
VIX index level |
Canonical Schemas¶
Canonical schemas define the internal normalized representations used by all feature blocks and Greeks.
These schemas are semantic, not vendor-specific.
Canonical Underlying Daily¶
Granularity: one row per (symbol, date)
Required Columns¶
| Column |
|---|
symbol |
date |
px_open |
px_high |
px_low |
px_close |
volume |
Optional Columns¶
| Column |
|---|
bid |
ask |
mid |
call_open_int_tot |
put_open_int_tot |
vix |
rate_sofr_90d |
rate_usgg3m |
Derived Columns¶
ATM IV
- iv_atm_hist
30-Day IV Smile
- iv_30_m90
- iv_30_m95
- iv_30_m100
- iv_30_m105
- iv_30_m110
60-Day IV Smile
- iv_60_m90
- iv_60_m95
- iv_60_m100
- iv_60_m105
- iv_60_m110
Derived columns may be missing if corresponding inputs are unavailable.
Canonical Option Chain Snapshot¶
Granularity: one row per option contract
Required Columns¶
| Column |
|---|
symbol |
date |
expiration |
strike |
call_ticker |
put_ticker |
exercise_type |
contract_size |
periodicity |
Derived Columns¶
| Column | Meaning |
|---|---|
time_to_expiry |
Time to expiration (years) |
moneyness |
Spot-relative moneyness |
log_moneyness |
Log-moneyness |
Canonical IV Surface Snapshot¶
Required Columns¶
| Column |
|---|
symbol |
date |
expiration |
delta |
strike |
iv |
Optional Columns¶
| Column |
|---|
spot |
dividend |
Validation Guarantees¶
OCF enforces:
- Required columns → hard error
- Missing derived columns → warning
- Unexpected columns → warning
- Domain violations → error
Examples:
- Prices must be positive
- IV must be strictly positive
- Rates must be decimals (not percentages)
- Delta must lie in
(0, 100) - Dates must not be null
Schema Stability¶
OCF guarantees:
- Canonical schemas are stable within a major version
- Breaking schema changes only occur in major releases
- Feature blocks gracefully handle missing derived data
- Pipelines fail fast on schema violations
As schemas are the foundation of reproducibility and correctness in OCF.