Authors the operator-run harness that closes the C-DB-3 deferral at
platform/cnpg-pair/DESIGN.md (1M-row write + region-kill + zero-tx-loss
assertion — CLAUDE.md §0 Pillar 3, deterministic step 10).
Why
---
Per the 2026-05-19 anti-theater audit, Pillar 3 has never been verified
by an automated suite — the chart render gate is green but "operator
kills primary region → ≤30s failover → zero transactions lost" was a
claim, not a measurement. The harness is the measurement.
Shape
-----
Self-contained Go module under platform/cnpg-pair/tests/acceptance/:
cmd/d31-acceptance/main.go — entrypoint, 7-phase orchestration
internal/harness/counter.go — gap detector + zero-tx-loss assert
internal/harness/driver.go — psql + kubectl shell-out drivers
internal/harness/writer.go — N-worker writer goroutine pool
internal/harness/*_test.go — 23 unit tests, race-clean
Containerfile — alpine:3.20 + psql + kubectl
README.md — operator-run brief incl. RBAC + Job
Stdlib-only (shells out to psql and kubectl from the runtime image)
so the build is hermetic and the image stays small.
Phases (see main.go header comment)
-----------------------------------
0 Schema bootstrap (TRUNCATE-on-start so re-runs are clean).
1 8 writers INSERT 1KB rows in 1000-batches against <primary>-rw.
2 --pre-kill-warmup (30s) of stable writes.
3 REGION KILL: patch primary Cluster CR spec.instances=0; record time.
4 Promote replica: patch replica Cluster CR spec.replica.enabled=false.
5 Poll replica status.currentPrimary; FAIL after --rto-deadline (90s).
6 Settle period (5s) before SELECT on new primary.
7 SELECT id ORDER BY id; assert FLOOR (count >= writer-ACKd) + GAP-FREE
(BIGSERIAL sequence is 1..max with no holes; synchronous_commit=
remote_apply makes this the contract; any gap = a lost tx).
Exit codes
----------
0 PASS — zero-tx-loss verified.
1 FAIL — gap detected OR floor missed (zero-tx-loss bar broken).
2 FAIL — RTO exceeded (replica did not promote within 90s).
3 FAIL — harness error before failover (bad flags / schema / ...).
Fail-safe — all ops bounded by ctx deadlines so the harness NEVER hangs
(per the CLAUDE.md anti-theater "report FAIL with diagnostics, don't
hang forever" rule).
CI
--
.github/workflows/build-d31-acceptance.yaml mirrors the
build-continuum-controller.yaml shape — go vet, go test -race,
go build, GHCR push, cosign keyless signing, SBOM attestation. No
auto-bump step (the harness is operator-invoked; no chart pin needs
the SHA stamped). Event-driven, no cron, paths-filtered.
Honest disclosure (CLAUDE.md §0 anti-theater)
---------------------------------------------
This PR ships the harness CODE. D31 itself flips to VERIFIED-PASS in
docs/TRUST.md only AFTER the operator runs the image on a fresh
2-region Sovereign with exit 0 + screenshots attached to the issue —
hence Refs #2067, NOT Closes#2067.
Validation done locally
-----------------------
go vet ./... clean
go test -count=1 -race ./... 23/23 PASS
CGO_ENABLED=0 go build ./cmd/... ELF static binary OK
./d31-acceptance exits 3 with bad-flags msg
./d31-acceptance -h shows all flags
bash platform/cnpg-pair/chart/tests/cnpg-pair-render.sh all 6 still PASS
actionlint .github/workflows/build-d31-acceptance.yaml no errors
Refs #2067
Refs #1831 (D31 epic)
Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>