Module rate_limited

Expand description

Client-side rate limiting for remote providers.

Wraps any InferenceProvider in a token bucket (governor, the standard Rust limiter) so request bursts are shaped to a steady requests-per-second rather than sent unbounded. One cell is acquired per input row before the batch is dispatched. The wait happens on the Ring 1 inference worker — the only place infer_batch is awaited — never on Ring 0; the limiter itself is lock-free (atomics), so nothing blocking is reachable from the compute thread.

Structs§

RateLimitedProvider: An InferenceProvider that paces calls to a steady rate.