Observability & the operations report (design — lands with tracing)
Principle: instrument once, export anywhere
Tracing is built on the open standard — OpenTelemetry / OpenInference — not a
single vendor. assay emits standard spans; the backend is a swappable
exporter. No lock-in, and the same instrumentation lands in whichever tool a
team already runs.
| Backend | Role | Why |
|---|---|---|
| Arize Phoenix | default | open-source, runs locally, OpenInference-native — zero setup |
| W&B Weave | supported export | instrument once, view in Weave too |
| Self-hosted (Phoenix / Langfuse) | regulated / on-prem | data residency — traces never leave your environment |
| Others (Galileo, Braintrust, raw OTLP) | swap the exporter | standard OTLP out |
Tracing
- Every step, gate decision, approval, and artifact write emits a span, mirroring the audit log.
- LLM judgment steps emit one span per model call (prompt, tokens, latency, cost).
- Spans carry the
run_id, so a trace and its tamper-evidentaudit.jsonlline up.
SLIs
- run outcome (completed / blocked / awaiting / failed)
- per-step + end-to-end latency
- gate block-rate; exception rate
- maker-checker wait time
- LLM cost / latency / token usage
Operations report — clean vs. issues
A rollup across runs for the operations team:
- Clean: completed · gate APPROVE · no exceptions.
- Issues: BLOCKed (fabrication) · exceptions noted · FAILED · SLA-breached.
Each issue links to its run directory + audit log for triage, so the report is a queue, not just a dashboard. Feeds ESCALATION.md.
Data residency: with a self-hosted backend, no trace data leaves the deployment — the same property the audit log guarantees for evidence.