Skip to content

Pipelines

The OCF pipeline provides a deterministic, schema-driven orchestration layer that transforms raw vendor inputs into model-ready feature tables and Greeks outputs.


Pipeline Overview

At a high level, the pipeline performs the following stages:

  1. Normalization of raw vendor inputs
  2. Alignment of daily underlying data
  3. Snapshot enrichment of option chains
  4. Feature construction on underlying data
  5. Per-option Greeks computation (optional)
  6. Validation of final canonical outputs

Each stage is implemented as a pure transformation.


Canonical Pipeline Flow

                         Raw Inputs
                              │
                              ▼
                        Normalization
                              │
                              ▼
                        Canonical Tables
                              │
                              ▼
                     Underlying Alignment
                              │
                              ▼
                 Canonical Underlying Daily
                              │
              ┌───────────────┴────────────────┐
              │                                  │
              ▼                                  ▼
       Feature Blocks                     Option Chain Enrichment
   (OHLCV, IV, Liquidity, etc.)                    │
                                                  ▼
                                         Per-Option Greeks
All outputs are returned as in-memory tables.


Pipeline Configuration

The pipeline is configured using a strongly-typed configuration object.

from ocf.pipelines.pipeline import PipelineConfig

Configuration Fields

Field Description
symbol Underlying symbol identifier
snapshot_date Valuation date for the option chain snapshot
include_features Feature blocks to include
exclude_features Feature blocks to exclude
rate_field Risk-free rate column used in Greeks
build_per_option Whether to compute per-option Greeks
require_iv_surface Enforce presence of full IV surface

The configuration object is immutable once passed to the pipeline.


Public Pipeline APIs

OCF exposes two public pipeline entry points.


Run Pipeline from Raw Inputs

from ocf.pipelines.pipeline import run_pipeline
run_pipeline(
    *,
    config: PipelineConfig,
    raw_inputs: dict[str, str | polars.DataFrame],
) -> dict[str, polars.DataFrame]

Runs the full pipeline starting from raw vendor-style inputs. This includes:

  • Normalization
  • Alignment
  • Feature construction
  • Greeks computation
  • Validation

Run Pipeline from Normalized Inputs

from ocf.pipelines.pipeline import run_pipeline_normalized
run_pipeline_normalized(
    *,
    config: PipelineConfig,
    normalized_inputs: dict[str, polars.DataFrame],
) -> dict[str, polars.DataFrame]

Runs the pipeline starting from already-normalized canonical tables.

This entry point is intended for:

  • Advanced users
  • Custom ingestion pipelines
  • External normalization systems

The caller is responsible for ensuring schema correctness.


Pipeline Outputs

The pipeline returns a dictionary of canonical outputs:

Key Description
underlying_daily Canonical aligned daily underlying table
features Model-ready underlying feature table
option_chain_snapshot Enriched option chain snapshot
per_option_greeks Per-option Greeks (if enabled)

All outputs are validated before returning.


Validation Guarantees

Before returning, the pipeline validates:

  • Presence of required tables
  • Schema correctness
  • Column consistency
  • Optional IV surface completeness

Validation rules are enforced by the schemas layer.


Design Principles

The pipeline layer follows strict design rules:

  • No file IO
  • No hidden joins
  • No implicit defaults
  • Deterministic execution
  • Composable steps

Each step can be unit tested in isolation.


For step-level implementation details, see:

  • Schemas
  • Features
  • Greeks