Expand description
The single transport abstraction over a model backend.
An InferenceProvider does I/O only: a homogeneous batch of inputs in, a
homogeneous batch of outputs plus usage out. It knows nothing about SQL tasks
or output columns — framing a request and turning the response into a task’s
output column is the adapter’s job. Implementors: Anthropic, an
OpenAI-compatible provider (OpenAI / Azure / vLLM via base_url), and a
local ONNX Runtime provider.
Structs§
- Inference
Params - Knobs that shape a request and contribute to the cache’s
params_version, so the same text under different parameters never collides. Generation knobs (max_tokens,temperature, …) are added here as backends consume them. - Inference
Request - One batch of inputs to run through a model. A request is homogeneous: a single task, a single model, and one input string per row in order.
- Inference
Response - The result of a batch inference call: outputs aligned 1:1 with the request’s inputs, plus usage.
- Usage
- Token and cost accounting for a single batch call. Local backends report
Usage::ZERO; remote backends report what the provider charged.
Enums§
- Inference
Outputs - Per-row outputs of a batch. Homogeneous for a given request: a classify or generate batch yields text; an embed batch — or a local classifier’s raw logits awaiting softmax in the adapter — yields numeric vectors; a sentiment batch yields one scalar score per row (the adapter’s output, never a raw provider shape).
- Provider
Error - Errors a provider can return for a batch call.
Traits§
- Inference
Provider - Transport over a model backend. Implementors perform I/O only — no task
framing, no result parsing. Shared as
Arc<dyn InferenceProvider>and driven from the Ring 1 inference worker, never from Ring 0.