IO Layer¶
The IO layer in Options Chain Features (OCF) provides optional, lightweight utilities for reading and writing tabular data.
IO is intentionally decoupled from normalization, feature construction, Greeks, and pipelines. All core computation in OCF operates purely on in-memory polars.DataFrame objects and does not require any IO utilities.
Design Philosophy¶
The IO layer follows these principles:
- Explicit – all reads and writes are user-initiated
- Non-invasive – no schema modification or inference
- Stateless – no hidden caches or global state
- Optional – OCF can be used without importing
ocf.ioat all - Backend-agnostic – IO does not affect computation logic
IO exists purely as a convenience layer.
Supported Formats¶
OCF currently provides helpers for:
- CSV
- Parquet
- DuckDB
All IO functions return or consume polars.DataFrame objects.
Readers¶
Readers load external data into memory without validation or normalization.
CSV Reader¶
from ocf.io.readers import read_csv
read_csv(
obj: str | pathlib.Path | polars.DataFrame,
*,
try_parse_dates: bool = False,
) -> polars.DataFrame
Behavior
- Accepts a file path or an existing
DataFrame - Returns a cloned
polars.DataFrame - Performs no schema checks
- Performs no column renaming
This allows seamless passthrough in pipelines and tests.
Parquet Reader¶
from ocf.io.readers import read_parquet
read_parquet(path: str | pathlib.Path) -> polars.DataFrame
Reads a Parquet file into a polars.DataFrame.
DuckDB Table Reader¶
from ocf.io.readers import read_duckdb_table
read_duckdb_table(
conn: duckdb.DuckDBPyConnection,
table: str,
) -> polars.DataFrame
Reads a DuckDB table into Polars. The caller is responsible for managing the DuckDB connection.
Writers¶
Writers persist DataFrames as-is, without schema enforcement.
CSV Writer¶
from ocf.io.writers import write_csv
write_csv(
df: polars.DataFrame,
path: str | pathlib.Path,
*,
overwrite: bool = True,
) -> None
- Creates parent directories if needed
- Overwrite protection is explicit
Parquet Writer¶
from ocf.io.writers import write_parquet
write_parquet(
df: polars.DataFrame,
path: str | pathlib.Path,
*,
overwrite: bool = True,
) -> None
Used for efficient storage of feature tables and intermediate outputs.
DuckDB Store¶
OCF provides a lightweight DuckDB wrapper for structured persistence.
from ocf.io.duckdb_store import DuckDBStore
Initialization¶
store = DuckDBStore("path/to/db.duckdb")
- Automatically creates parent directories
- Opens a persistent DuckDB connection
Writing Tables¶
store.write_table(
name="features",
df=features_df,
mode="append", # or "replace"
)
append: inserts rows into an existing tablereplace: drops and recreates the table- Empty DataFrames are silently ignored
Reading Tables¶
df = store.read_table("features")
Querying¶
df = store.query("SELECT * FROM features WHERE date >= '2024-01-01'")
Returns results as a polars.DataFrame.
Utilities¶
store.list_tables()
store.close()
What the IO Layer Does Not Do¶
The IO layer does not:
- Validate schemas
- Normalize vendor data
- Join datasets
- Run pipelines
- Infer column meanings
All such logic lives in:
ocf.data.normalizeocf.data.alignocf.featuresocf.greeksocf.pipelines
When to Use the IO Layer¶
| Use Case | Recommended |
|---|---|
| Interactive research | Optional |
| Batch feature generation | Helpful |
| Persistent feature storage | Recommended |
| Live / streaming systems | External tooling |
Summary¶
The IO layer in OCF is intentionally minimal as it moves data, stores results, and never alters computation. Users are free to replace it entirely with their own ingestion or persistence systems.