Expand description
Client-side rate limiting for remote providers.
Wraps any InferenceProvider in a token bucket (governor, the standard
Rust limiter) so request bursts are shaped to a steady requests-per-second
rather than sent unbounded. One cell is acquired per input row before the
batch is dispatched. The wait happens on the Ring 1 inference worker — the
only place infer_batch is awaited — never on Ring 0; the limiter itself is
lock-free (atomics), so nothing blocking is reachable from the compute thread.
Structs§
- Rate
Limited Provider - An
InferenceProviderthat paces calls to a steady rate.