Sample run artifacts — proofs you can read

Generated offline (an injected judge, no API key) so anyone can reproduce them. A live run logs the real model id instead of injected-judge; everything else is identical.

`clean/` — a change that passes

artifacts/llm_mapping.json — the reproducibility proof. It records the model id, temperature: 0, the exact prompt, and the raw model output — so the control test can be re-performed and tied out by an auditor.
artifacts/workpaper.json — the auditable test result: controls tested, gate decision, approval, conclusion.
audit.jsonl — the tamper-evident, hash-chained log of every step (AuditLog.verify()).

`blocked/` — the model fabricated a citation

The model cited an approval — “Approved by the CEO on January 1st.” — that is not in the evidence. audit.jsonl shows the gate’s verdict:

step_result … "status": "blocked" … "reason": "anti-fabrication gate: 1 unciteable assertion(s)"

Nothing shipped. That’s the hallucination caught and refused — with the prompt and raw output preserved in artifacts/llm_mapping.json for review.

Sample run artifacts — proofs you can read

clean/ — a change that passes

blocked/ — the model fabricated a citation

`clean/` — a change that passes

`blocked/` — the model fabricated a citation