Pipelines¶
The OCF pipeline provides a deterministic, schema-driven orchestration layer that transforms raw vendor inputs into model-ready feature tables and Greeks outputs.
Pipeline Overview¶
At a high level, the pipeline performs the following stages:
- Normalization of raw vendor inputs
- Alignment of daily underlying data
- Snapshot enrichment of option chains
- Feature construction on underlying data
- Per-option Greeks computation (optional)
- Validation of final canonical outputs
Each stage is implemented as a pure transformation.
Canonical Pipeline Flow¶
Raw Inputs
│
▼
Normalization
│
▼
Canonical Tables
│
▼
Underlying Alignment
│
▼
Canonical Underlying Daily
│
┌───────────────┴────────────────┐
│ │
▼ ▼
Feature Blocks Option Chain Enrichment
(OHLCV, IV, Liquidity, etc.) │
▼
Per-Option Greeks
Pipeline Configuration¶
The pipeline is configured using a strongly-typed configuration object.
from ocf.pipelines.pipeline import PipelineConfig
Configuration Fields¶
| Field | Description |
|---|---|
symbol |
Underlying symbol identifier |
snapshot_date |
Valuation date for the option chain snapshot |
include_features |
Feature blocks to include |
exclude_features |
Feature blocks to exclude |
rate_field |
Risk-free rate column used in Greeks |
build_per_option |
Whether to compute per-option Greeks |
require_iv_surface |
Enforce presence of full IV surface |
The configuration object is immutable once passed to the pipeline.
Public Pipeline APIs¶
OCF exposes two public pipeline entry points.
Run Pipeline from Raw Inputs¶
from ocf.pipelines.pipeline import run_pipeline
run_pipeline(
*,
config: PipelineConfig,
raw_inputs: dict[str, str | polars.DataFrame],
) -> dict[str, polars.DataFrame]
Runs the full pipeline starting from raw vendor-style inputs. This includes:
- Normalization
- Alignment
- Feature construction
- Greeks computation
- Validation
Run Pipeline from Normalized Inputs¶
from ocf.pipelines.pipeline import run_pipeline_normalized
run_pipeline_normalized(
*,
config: PipelineConfig,
normalized_inputs: dict[str, polars.DataFrame],
) -> dict[str, polars.DataFrame]
Runs the pipeline starting from already-normalized canonical tables.
This entry point is intended for:
- Advanced users
- Custom ingestion pipelines
- External normalization systems
The caller is responsible for ensuring schema correctness.
Pipeline Outputs¶
The pipeline returns a dictionary of canonical outputs:
| Key | Description |
|---|---|
underlying_daily |
Canonical aligned daily underlying table |
features |
Model-ready underlying feature table |
option_chain_snapshot |
Enriched option chain snapshot |
per_option_greeks |
Per-option Greeks (if enabled) |
All outputs are validated before returning.
Validation Guarantees¶
Before returning, the pipeline validates:
- Presence of required tables
- Schema correctness
- Column consistency
- Optional IV surface completeness
Validation rules are enforced by the schemas layer.
Design Principles¶
The pipeline layer follows strict design rules:
- No file IO
- No hidden joins
- No implicit defaults
- Deterministic execution
- Composable steps
Each step can be unit tested in isolation.
For step-level implementation details, see:
- Schemas
- Features
- Greeks