Skip to main content

Module delta_source

Module delta_source 

Source
Expand description

Delta Lake source connector implementation.

DeltaSource implements SourceConnector, reading Arrow RecordBatch data from Delta Lake tables by polling for new versions.

§Polling Strategy

The source maintains a current_version cursor. On each poll_batch():

  1. Drain any buffered batches from the previous load first
  2. Throttle: skip version check if less than poll_interval since last check
  3. Check if the table has a newer version than current_version
  4. If yes, jump directly to the latest version (O(1) catch-up)
  5. Scan bounded by max_records via DataFusion streaming execution
  6. Buffer results; current_version only advances after the buffer is fully drained, so checkpoint always reflects fully-consumed state

§Checkpoint / Recovery

The checkpoint stores current_version so that on recovery the source resumes from the correct Delta Lake version.

Structs§

DeltaSource
Delta Lake source connector.