Expand description
Delta Lake I/O integration module.
This module provides the actual I/O operations for Delta Lake tables via the
deltalake crate. All functions are feature-gated behind delta-lake.
§Architecture
The I/O module is separate from the business logic in delta.rs
to allow:
- Testing business logic without the
deltalakedependency - Clean separation of concerns (buffering/epoch management vs. actual writes)
- Easy mocking for unit tests
§Exactly-Once Semantics
Delta Lake’s transaction log supports application-level transaction metadata
via the txn action. We use this to store (writer_id, epoch) pairs, enabling
exactly-once semantics:
- On recovery, read
txnmetadata to find the last committed epoch for this writer - Skip epochs <= last committed (idempotent replay)
- Each write includes the epoch in
txnmetadata
Structs§
- Compaction
Result - Result of a compaction (OPTIMIZE) operation.
- Merge
Result - Result of a MERGE (upsert) operation.
Functions§
- get_
last_ committed_ epoch - Retrieves the last committed epoch for a writer from Delta Lake’s txn metadata.
- get_
latest_ version - Returns the latest committed version via the log store.
- get_
table_ schema - Extracts the Arrow schema from a Delta Lake table.
- map_
cdf_ to_ changelog - Maps CDF
_change_type→_op(I/U/D), dropsupdate_preimagerows and CDF metadata columns (_change_type,_commit_version,_commit_timestamp). ReturnsNoneif all rows were preimages. - merge_
changelog - Atomic changelog MERGE: inserts, updates, and deletes in one Delta commit.
- open_
or_ create_ table - Opens an existing Delta Lake table or creates a new one.
- read_
batches_ at_ version - Reads record batches from a specific Delta Lake table version.
- read_
cdf_ batches - Reads CDF batches for a version range via
scan_cdf(). - read_
version_ diff - Reads only the rows added in a specific Delta Lake version.
- resolve_
catalog_ options - Resolves catalog-aware table URI and merges catalog-specific storage options.
- run_
compaction - Runs an OPTIMIZE compaction on a Delta Lake table.
- run_
vacuum - Runs VACUUM on a Delta Lake table, deleting old unreferenced files.
- write_
batches - Writes batches to a Delta Lake table with exactly-once semantics.