Expand description
Concrete inference backends.
Each backend is feature-gated so the default build carries zero HTTP or ML
weight. Backends are transport only: they turn an InferenceRequest into
provider calls and the responses back into InferenceOutputs. Task
framing (the chat prompt) and any numeric post-processing live in the shared
helpers and the adapter, not in the wire layer.
Re-exports§
pub use anthropic::AnthropicProvider;pub use local::LocalProvider;pub use openai::OpenAiProvider;pub use rate_limited::RateLimitedProvider;
Modules§
- anthropic
- Anthropic Messages API provider.
- local
- Local inference via ONNX Runtime (
ort, loaded dynamically): encoder models only (BERT/DistilBERT/MiniLM) — classify/sentiment return logits, embed returns a mean-pooled vector; generative tasks are rejected. A model is loaded once per source and cached; the forward pass runs onspawn_blockingunder a deadline so a wedged model can’t stall the worker. ORT loads at runtime —onnxruntime.{dll,so}(>= 1.24) must be on the search path or named byORT_DYLIB_PATH. Asourceishf:org/repo(downloaded on first use),file://<path>, or a bare path; labels come from the model’sconfig.jsonid2label. - openai
- OpenAI-compatible remote provider.
- rate_
limited - Client-side rate limiting for remote providers.