Skip to main content

Module backends

Module backends 

Source
Expand description

Concrete inference backends.

Each backend is feature-gated so the default build carries zero HTTP or ML weight. Backends are transport only: they turn an InferenceRequest into provider calls and the responses back into InferenceOutputs. Task framing (the chat prompt) and any numeric post-processing live in the shared helpers and the adapter, not in the wire layer.

Re-exports§

pub use anthropic::AnthropicProvider;
pub use local::LocalProvider;
pub use openai::OpenAiProvider;
pub use rate_limited::RateLimitedProvider;

Modules§

anthropic
Anthropic Messages API provider.
local
Local inference via ONNX Runtime (ort, loaded dynamically). Encoder models only — the BERT / DistilBERT / MiniLM family: classification/sentiment yield logits (the adapter argmaxes), embedding yields a mean-pooled vector. Generative tasks are rejected. A model is loaded once per source and cached; the forward pass runs on spawn_blocking, off the Ring 1 task, under a deadline so a pathological model can never stall the worker (and the watermark behind it). ONNX Runtime is loaded at runtime, so onnxruntime.dll / .so (ORT >= 1.24) must be on the search path or named by ORT_DYLIB_PATH.
openai
OpenAI-compatible remote provider.
rate_limited
Client-side rate limiting for remote providers.