Skip to main content

Module backends

Module backends 

Source
Expand description

Concrete inference backends.

Each backend is feature-gated so the default build carries zero HTTP or ML weight. Backends are transport only: they turn an InferenceRequest into provider calls and the responses back into InferenceOutputs. Task framing (the chat prompt) and any numeric post-processing live in the shared helpers and the adapter, not in the wire layer.

Re-exports§

pub use anthropic::AnthropicProvider;
pub use local::LocalProvider;
pub use openai::OpenAiProvider;
pub use rate_limited::RateLimitedProvider;

Modules§

anthropic
Anthropic Messages API provider.
local
Local inference via ONNX Runtime (ort, loaded dynamically): encoder models only (BERT/DistilBERT/MiniLM) — classify/sentiment return logits, embed returns a mean-pooled vector; generative tasks are rejected. A model is loaded once per source and cached; the forward pass runs on spawn_blocking under a deadline so a wedged model can’t stall the worker. ORT loads at runtime — onnxruntime.{dll,so} (>= 1.24) must be on the search path or named by ORT_DYLIB_PATH. A source is hf:org/repo (downloaded on first use), file://<path>, or a bare path; labels come from the model’s config.json id2label.
openai
OpenAI-compatible remote provider.
rate_limited
Client-side rate limiting for remote providers.