Expand description
Window function UDFs (TUMBLE, HOP, SESSION, CUMULATE)
Scalar UDFs that compute window boundary timestamps for streaming
GROUP BY TUMBLE(...) style queries.
Pairs tumble/tumble_end, hop/hop_end, cumulate/cumulate_end
must both appear in GROUP BY: DataFusion treats them as independent
group keys, even though within a fixed interval they are functionally
redundant. session(ts, gap) is a passthrough — real session
boundaries are computed by Ring 0 operators.
SELECT tumble(ts, INTERVAL '1' MINUTE) AS window_start,
tumble_end(ts, INTERVAL '1' MINUTE) AS window_end, ...
FROM ev
GROUP BY tumble(ts, INTERVAL '1' MINUTE),
tumble_end(ts, INTERVAL '1' MINUTE), ...Math runs in milliseconds (matching the Ring 0 watermark). The SQL
boundary returns Timestamp(Microsecond, None) so lakehouse sinks
(Iceberg, Delta, Parquet) accept the columns directly — Iceberg
rejects Timestamp(Millisecond).
Structs§
- Cumulate
Window End cumulate_end(ts, step, size)— exclusive upper bound of the epoch (size-aligned bucket) containingts.- Cumulate
Window Start cumulate(ts, step, size)— epoch start (size-aligned bucket) containingts. Per-step cumulating boundaries (which depend on data) are exposed by Ring 0.- HopWindow
End hop_end(ts, slide, size [, offset])— end of the earliest sliding window containingts(i.e.hop(...) + size).- HopWindow
Start hop(ts, slide, size [, offset])— earliest sliding window (sizesize, sliding byslide) that containsts. Full multi-window assignment is handled by Ring 0.- Session
Window Start session(ts, gap)— passthrough that convertststo microsecond resolution. Real session start/end are data-dependent and computed by Ring 0; this UDF exists soGROUP BY session(ts, gap)parses. There is nosession_endUDF for the same reason.- Tumble
Window End tumble_end(ts, interval [, offset])— exclusive upper bound of the tumble window containingts(i.e.tumble(...) + interval).- Tumble
Window Start tumble(ts, interval [, offset])— start of the non-overlapping window containingts. ReturnsTimestamp(Microsecond, None).