Skip to content

IO Layer

The IO layer in Options Chain Features (OCF) provides optional, lightweight utilities for reading and writing tabular data.

IO is intentionally decoupled from normalization, feature construction, Greeks, and pipelines. All core computation in OCF operates purely on in-memory polars.DataFrame objects and does not require any IO utilities.


Design Philosophy

The IO layer follows these principles:

  • Explicit – all reads and writes are user-initiated
  • Non-invasive – no schema modification or inference
  • Stateless – no hidden caches or global state
  • Optional – OCF can be used without importing ocf.io at all
  • Backend-agnostic – IO does not affect computation logic

IO exists purely as a convenience layer.


Supported Formats

OCF currently provides helpers for:

  • CSV
  • Parquet
  • DuckDB

All IO functions return or consume polars.DataFrame objects.


Readers

Readers load external data into memory without validation or normalization.

CSV Reader

from ocf.io.readers import read_csv
read_csv(
    obj: str | pathlib.Path | polars.DataFrame,
    *,
    try_parse_dates: bool = False,
) -> polars.DataFrame

Behavior

  • Accepts a file path or an existing DataFrame
  • Returns a cloned polars.DataFrame
  • Performs no schema checks
  • Performs no column renaming

This allows seamless passthrough in pipelines and tests.


Parquet Reader

from ocf.io.readers import read_parquet
read_parquet(path: str | pathlib.Path) -> polars.DataFrame

Reads a Parquet file into a polars.DataFrame.


DuckDB Table Reader

from ocf.io.readers import read_duckdb_table
read_duckdb_table(
    conn: duckdb.DuckDBPyConnection,
    table: str,
) -> polars.DataFrame

Reads a DuckDB table into Polars. The caller is responsible for managing the DuckDB connection.


Writers

Writers persist DataFrames as-is, without schema enforcement.

CSV Writer

from ocf.io.writers import write_csv
write_csv(
    df: polars.DataFrame,
    path: str | pathlib.Path,
    *,
    overwrite: bool = True,
) -> None
  • Creates parent directories if needed
  • Overwrite protection is explicit

Parquet Writer

from ocf.io.writers import write_parquet
write_parquet(
    df: polars.DataFrame,
    path: str | pathlib.Path,
    *,
    overwrite: bool = True,
) -> None

Used for efficient storage of feature tables and intermediate outputs.


DuckDB Store

OCF provides a lightweight DuckDB wrapper for structured persistence.

from ocf.io.duckdb_store import DuckDBStore

Initialization

store = DuckDBStore("path/to/db.duckdb")
  • Automatically creates parent directories
  • Opens a persistent DuckDB connection

Writing Tables

store.write_table(
    name="features",
    df=features_df,
    mode="append",  # or "replace"
)
  • append: inserts rows into an existing table
  • replace: drops and recreates the table
  • Empty DataFrames are silently ignored

Reading Tables

df = store.read_table("features")

Querying

df = store.query("SELECT * FROM features WHERE date >= '2024-01-01'")

Returns results as a polars.DataFrame.


Utilities

store.list_tables()
store.close()

What the IO Layer Does Not Do

The IO layer does not:

  • Validate schemas
  • Normalize vendor data
  • Join datasets
  • Run pipelines
  • Infer column meanings

All such logic lives in:

  • ocf.data.normalize
  • ocf.data.align
  • ocf.features
  • ocf.greeks
  • ocf.pipelines

When to Use the IO Layer

Use Case Recommended
Interactive research Optional
Batch feature generation Helpful
Persistent feature storage Recommended
Live / streaming systems External tooling

Summary

The IO layer in OCF is intentionally minimal as it moves data, stores results, and never alters computation. Users are free to replace it entirely with their own ingestion or persistence systems.