Pipelines¶

The OCF pipeline provides a deterministic, schema-driven orchestration layer that transforms raw vendor inputs into model-ready feature tables and Greeks outputs.

Pipeline Overview¶

At a high level, the pipeline performs the following stages:

Normalization of raw vendor inputs
Alignment of daily underlying data
Snapshot enrichment of option chains
Feature construction on underlying data
Per-option Greeks computation (optional)
Validation of final canonical outputs

Each stage is implemented as a pure transformation.

Canonical Pipeline Flow¶

                         Raw Inputs
                              │
                              ▼
                        Normalization
                              │
                              ▼
                        Canonical Tables
                              │
                              ▼
                     Underlying Alignment
                              │
                              ▼
                 Canonical Underlying Daily
                              │
              ┌───────────────┴────────────────┐
              │                                  │
              ▼                                  ▼
       Feature Blocks                     Option Chain Enrichment
   (OHLCV, IV, Liquidity, etc.)                    │
                                                  ▼
                                         Per-Option Greeks

All outputs are returned as in-memory tables.

Pipeline Configuration¶

The pipeline is configured using a strongly-typed configuration object.

from ocf.pipelines.pipeline import PipelineConfig

Configuration Fields¶

Field	Description
`symbol`	Underlying symbol identifier
`snapshot_date`	Valuation date for the option chain snapshot
`include_features`	Feature blocks to include
`exclude_features`	Feature blocks to exclude
`rate_field`	Risk-free rate column used in Greeks
`build_per_option`	Whether to compute per-option Greeks
`require_iv_surface`	Enforce presence of full IV surface

The configuration object is immutable once passed to the pipeline.

Public Pipeline APIs¶

OCF exposes two public pipeline entry points.

Run Pipeline from Raw Inputs¶

from ocf.pipelines.pipeline import run_pipeline

run_pipeline(
    *,
    config: PipelineConfig,
    raw_inputs: dict[str, str | polars.DataFrame],
) -> dict[str, polars.DataFrame]

Runs the full pipeline starting from raw vendor-style inputs. This includes:

Normalization
Alignment
Feature construction
Greeks computation
Validation

Run Pipeline from Normalized Inputs¶

from ocf.pipelines.pipeline import run_pipeline_normalized

run_pipeline_normalized(
    *,
    config: PipelineConfig,
    normalized_inputs: dict[str, polars.DataFrame],
) -> dict[str, polars.DataFrame]

Runs the pipeline starting from already-normalized canonical tables.

This entry point is intended for:

Advanced users
Custom ingestion pipelines
External normalization systems

The caller is responsible for ensuring schema correctness.

Pipeline Outputs¶

The pipeline returns a dictionary of canonical outputs:

Key	Description
`underlying_daily`	Canonical aligned daily underlying table
`features`	Model-ready underlying feature table
`option_chain_snapshot`	Enriched option chain snapshot
`per_option_greeks`	Per-option Greeks (if enabled)

All outputs are validated before returning.

Validation Guarantees¶

Before returning, the pipeline validates:

Presence of required tables
Schema correctness
Column consistency
Optional IV surface completeness

Validation rules are enforced by the schemas layer.

Design Principles¶

The pipeline layer follows strict design rules:

No file IO
No hidden joins
No implicit defaults
Deterministic execution
Composable steps

Each step can be unit tested in isolation.

For step-level implementation details, see:

Schemas
Features
Greeks