Expand description
Delta Lake source connector implementation.
DeltaSource implements SourceConnector, reading Arrow RecordBatch
data from Delta Lake tables by polling for new versions.
§Polling Strategy
The source maintains a current_version cursor. On each poll_batch():
- Drain any buffered batches from the previous load first
- Throttle: skip version check if less than
poll_intervalsince last check - Check if the table has a newer version than
current_version - If yes, jump directly to the latest version (O(1) catch-up)
- Scan bounded by
max_recordsviaDataFusionstreaming execution - Buffer results;
current_versiononly advances after the buffer is fully drained, so checkpoint always reflects fully-consumed state
§Checkpoint / Recovery
The checkpoint stores current_version so that on recovery the source
resumes from the correct Delta Lake version.
Structs§
- Delta
Source - Delta Lake source connector.