Expand description
Concrete inference backends.
Each backend is feature-gated so the default build carries zero HTTP or ML
weight. Backends are transport only: they turn an InferenceRequest into
provider calls and the responses back into InferenceOutputs. Task
framing (the chat prompt) and any numeric post-processing live in the shared
helpers and the adapter, not in the wire layer.
Re-exports§
pub use anthropic::AnthropicProvider;pub use local::LocalProvider;pub use openai::OpenAiProvider;pub use rate_limited::RateLimitedProvider;
Modules§
- anthropic
- Anthropic Messages API provider.
- local
- Local inference via ONNX Runtime (
ort, loaded dynamically). Encoder models only — the BERT / DistilBERT / MiniLM family: classification/sentiment yield logits (the adapter argmaxes), embedding yields a mean-pooled vector. Generative tasks are rejected. A model is loaded once per source and cached; the forward pass runs onspawn_blocking, off the Ring 1 task, under a deadline so a pathological model can never stall the worker (and the watermark behind it). ONNX Runtime is loaded at runtime, soonnxruntime.dll/.so(ORT >= 1.24) must be on the search path or named byORT_DYLIB_PATH. - openai
- OpenAI-compatible remote provider.
- rate_
limited - Client-side rate limiting for remote providers.