Skip to main content

Module window_udf

Module window_udf 

Source
Expand description

Window function UDFs (TUMBLE, HOP, SESSION, CUMULATE) Scalar UDFs that compute window boundary timestamps for streaming GROUP BY TUMBLE(...) style queries.

Pairs tumble/tumble_end, hop/hop_end, cumulate/cumulate_end must both appear in GROUP BY: DataFusion treats them as independent group keys, even though within a fixed interval they are functionally redundant. session(ts, gap) is a passthrough — real session boundaries are computed by Ring 0 operators.

SELECT tumble(ts, INTERVAL '1' MINUTE)     AS window_start,
       tumble_end(ts, INTERVAL '1' MINUTE) AS window_end, ...
FROM ev
GROUP BY tumble(ts, INTERVAL '1' MINUTE),
         tumble_end(ts, INTERVAL '1' MINUTE), ...

Math runs in milliseconds (matching the Ring 0 watermark). The SQL boundary returns Timestamp(Microsecond, None) so lakehouse sinks (Iceberg, Delta, Parquet) accept the columns directly — Iceberg rejects Timestamp(Millisecond).

Structs§

CumulateWindowEnd
cumulate_end(ts, step, size) — exclusive upper bound of the epoch (size-aligned bucket) containing ts.
CumulateWindowStart
cumulate(ts, step, size) — epoch start (size-aligned bucket) containing ts. Per-step cumulating boundaries (which depend on data) are exposed by Ring 0.
HopWindowEnd
hop_end(ts, slide, size [, offset]) — end of the earliest sliding window containing ts (i.e. hop(...) + size).
HopWindowStart
hop(ts, slide, size [, offset]) — earliest sliding window (size size, sliding by slide) that contains ts. Full multi-window assignment is handled by Ring 0.
SessionWindowStart
session(ts, gap) — passthrough that converts ts to microsecond resolution. Real session start/end are data-dependent and computed by Ring 0; this UDF exists so GROUP BY session(ts, gap) parses. There is no session_end UDF for the same reason.
TumbleWindowEnd
tumble_end(ts, interval [, offset]) — exclusive upper bound of the tumble window containing ts (i.e. tumble(...) + interval).
TumbleWindowStart
tumble(ts, interval [, offset]) — start of the non-overlapping window containing ts. Returns Timestamp(Microsecond, None).