IO Layer¶

The IO layer in Options Chain Features (OCF) provides optional, lightweight utilities for reading and writing tabular data.

IO is intentionally decoupled from normalization, feature construction, Greeks, and pipelines. All core computation in OCF operates purely on in-memory polars.DataFrame objects and does not require any IO utilities.

Design Philosophy¶

The IO layer follows these principles:

Explicit – all reads and writes are user-initiated
Non-invasive – no schema modification or inference
Stateless – no hidden caches or global state
Optional – OCF can be used without importing ocf.io at all
Backend-agnostic – IO does not affect computation logic

IO exists purely as a convenience layer.

Supported Formats¶

OCF currently provides helpers for:

CSV
Parquet
DuckDB

All IO functions return or consume polars.DataFrame objects.

Readers¶

Readers load external data into memory without validation or normalization.

CSV Reader¶

from ocf.io.readers import read_csv

read_csv(
    obj: str | pathlib.Path | polars.DataFrame,
    *,
    try_parse_dates: bool = False,
) -> polars.DataFrame

Behavior

Accepts a file path or an existing DataFrame
Returns a cloned polars.DataFrame
Performs no schema checks
Performs no column renaming

This allows seamless passthrough in pipelines and tests.

Parquet Reader¶

from ocf.io.readers import read_parquet

read_parquet(path: str | pathlib.Path) -> polars.DataFrame

Reads a Parquet file into a polars.DataFrame.

DuckDB Table Reader¶

from ocf.io.readers import read_duckdb_table

read_duckdb_table(
    conn: duckdb.DuckDBPyConnection,
    table: str,
) -> polars.DataFrame

Reads a DuckDB table into Polars. The caller is responsible for managing the DuckDB connection.

Writers¶

Writers persist DataFrames as-is, without schema enforcement.

CSV Writer¶

from ocf.io.writers import write_csv

write_csv(
    df: polars.DataFrame,
    path: str | pathlib.Path,
    *,
    overwrite: bool = True,
) -> None

Creates parent directories if needed
Overwrite protection is explicit

Parquet Writer¶

from ocf.io.writers import write_parquet

write_parquet(
    df: polars.DataFrame,
    path: str | pathlib.Path,
    *,
    overwrite: bool = True,
) -> None

Used for efficient storage of feature tables and intermediate outputs.

DuckDB Store¶

OCF provides a lightweight DuckDB wrapper for structured persistence.

from ocf.io.duckdb_store import DuckDBStore

Initialization¶

store = DuckDBStore("path/to/db.duckdb")

Automatically creates parent directories
Opens a persistent DuckDB connection

Writing Tables¶

store.write_table(
    name="features",
    df=features_df,
    mode="append",  # or "replace"
)

append: inserts rows into an existing table
replace: drops and recreates the table
Empty DataFrames are silently ignored

Reading Tables¶

df = store.read_table("features")

Querying¶

df = store.query("SELECT * FROM features WHERE date >= '2024-01-01'")

Returns results as a polars.DataFrame.

Utilities¶

store.list_tables()
store.close()

What the IO Layer Does Not Do¶

The IO layer does not:

Validate schemas
Normalize vendor data
Join datasets
Run pipelines
Infer column meanings

All such logic lives in:

ocf.data.normalize
ocf.data.align
ocf.features
ocf.greeks
ocf.pipelines

When to Use the IO Layer¶

Use Case	Recommended
Interactive research	Optional
Batch feature generation	Helpful
Persistent feature storage	Recommended
Live / streaming systems	External tooling

Summary¶

The IO layer in OCF is intentionally minimal as it moves data, stores results, and never alters computation. Users are free to replace it entirely with their own ingestion or persistence systems.