openova/platform
e3mrah 30d75aa229
feat(cnpg-pair/acceptance): ship D31 zero-tx-loss test harness (Refs #2067) (#2075)
Authors the operator-run harness that closes the C-DB-3 deferral at
platform/cnpg-pair/DESIGN.md (1M-row write + region-kill + zero-tx-loss
assertion — CLAUDE.md §0 Pillar 3, deterministic step 10).

Why
---
Per the 2026-05-19 anti-theater audit, Pillar 3 has never been verified
by an automated suite — the chart render gate is green but "operator
kills primary region → ≤30s failover → zero transactions lost" was a
claim, not a measurement. The harness is the measurement.

Shape
-----
Self-contained Go module under platform/cnpg-pair/tests/acceptance/:

  cmd/d31-acceptance/main.go       — entrypoint, 7-phase orchestration
  internal/harness/counter.go      — gap detector + zero-tx-loss assert
  internal/harness/driver.go       — psql + kubectl shell-out drivers
  internal/harness/writer.go       — N-worker writer goroutine pool
  internal/harness/*_test.go       — 23 unit tests, race-clean
  Containerfile                    — alpine:3.20 + psql + kubectl
  README.md                        — operator-run brief incl. RBAC + Job

Stdlib-only (shells out to psql and kubectl from the runtime image)
so the build is hermetic and the image stays small.

Phases (see main.go header comment)
-----------------------------------
0  Schema bootstrap (TRUNCATE-on-start so re-runs are clean).
1  8 writers INSERT 1KB rows in 1000-batches against <primary>-rw.
2  --pre-kill-warmup (30s) of stable writes.
3  REGION KILL: patch primary Cluster CR spec.instances=0; record time.
4  Promote replica: patch replica Cluster CR spec.replica.enabled=false.
5  Poll replica status.currentPrimary; FAIL after --rto-deadline (90s).
6  Settle period (5s) before SELECT on new primary.
7  SELECT id ORDER BY id; assert FLOOR (count >= writer-ACKd) + GAP-FREE
   (BIGSERIAL sequence is 1..max with no holes; synchronous_commit=
   remote_apply makes this the contract; any gap = a lost tx).

Exit codes
----------
  0  PASS — zero-tx-loss verified.
  1  FAIL — gap detected OR floor missed (zero-tx-loss bar broken).
  2  FAIL — RTO exceeded (replica did not promote within 90s).
  3  FAIL — harness error before failover (bad flags / schema / ...).

Fail-safe — all ops bounded by ctx deadlines so the harness NEVER hangs
(per the CLAUDE.md anti-theater "report FAIL with diagnostics, don't
hang forever" rule).

CI
--
.github/workflows/build-d31-acceptance.yaml mirrors the
build-continuum-controller.yaml shape — go vet, go test -race,
go build, GHCR push, cosign keyless signing, SBOM attestation. No
auto-bump step (the harness is operator-invoked; no chart pin needs
the SHA stamped). Event-driven, no cron, paths-filtered.

Honest disclosure (CLAUDE.md §0 anti-theater)
---------------------------------------------
This PR ships the harness CODE. D31 itself flips to VERIFIED-PASS in
docs/TRUST.md only AFTER the operator runs the image on a fresh
2-region Sovereign with exit 0 + screenshots attached to the issue —
hence Refs #2067, NOT Closes #2067.

Validation done locally
-----------------------
  go vet ./...                              clean
  go test -count=1 -race ./...              23/23 PASS
  CGO_ENABLED=0 go build ./cmd/...          ELF static binary OK
  ./d31-acceptance                          exits 3 with bad-flags msg
  ./d31-acceptance -h                       shows all flags
  bash platform/cnpg-pair/chart/tests/cnpg-pair-render.sh   all 6 still PASS
  actionlint .github/workflows/build-d31-acceptance.yaml    no errors

Refs #2067
Refs #1831 (D31 epic)

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 10:41:10 +04:00
..
alloy fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
anthropic-adapter feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
bge feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
bp-dmz-vcluster fix(charts): resolve bp-dmz-vcluster duplicate-name pseudo-drift (Closes A6c) (#1771) 2026-05-18 20:31:41 +04:00
bp-mgmt-vcluster fix(bp-mgmt-vcluster): namespace PSS baseline (was restricted) — A4 (#1535) 2026-05-16 18:06:59 +04:00
bp-rtz-vcluster fix(blueprints): vcluster charts smoke-render annotation = "default-off" (#1527) 2026-05-16 16:15:51 +04:00
bp-vcluster-helmrepo fix(bp-vcluster-helmrepo): install vclusters.vcluster.com CRD on fresh prov (Refs #1945) (#1964) 2026-05-19 19:25:34 +04:00
cert-manager fix(ci): sync stale blueprint.yaml versions + soften push-mode pin-sync race (Closes #1849) (#1855) 2026-05-19 00:34:48 +04:00
cert-manager-dynadot-webhook fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
cert-manager-powerdns-webhook fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
cilium fix(ci): sync stale blueprint.yaml versions + soften push-mode pin-sync race (Closes #1849) (#1855) 2026-05-19 00:34:48 +04:00
clickhouse docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
cluster-autoscaler-hcloud fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
cnpg fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
cnpg-pair feat(cnpg-pair/acceptance): ship D31 zero-tx-loss test harness (Refs #2067) (#2075) 2026-05-20 10:41:10 +04:00
coraza fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220) 2026-04-30 06:15:28 +02:00
crossplane fix(blueprints): align blueprint.yaml spec.version with Chart.yaml version (#817) (#819) 2026-05-04 22:32:49 +04:00
crossplane-claims fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
debezium docs(pass-32): registry-DNS sweep — harbor.<domain> across 9 component READMEs 2026-04-27 22:36:39 +02:00
external-dns fix(bp-external-dns): apiserver Endpoints sync timeout — Cilium kube-apiserver entity required (closes #770) (#771) 2026-05-04 19:27:17 +04:00
external-secrets fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
external-secrets-stores fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
failover-controller refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
falco fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
ferretdb docs(pass-11b): retry banners on failover-controller/trivy/clickhouse/ferretdb (Edit needed Read first) 2026-04-27 21:45:56 +02:00
flink docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
flux fix(bp-flux-stuck-hr-recovery): grant helmreleases/status patch RBAC + log stderr (Closes #1995) (#1998) 2026-05-20 01:55:14 +04:00
gateway-api fix: bp-gateway-api 5→10 CRDs + bp-gitea CNPG + bp-harbor CNPG race fix + DAG audit (#592) 2026-05-02 15:20:05 +04:00
gitea fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
grafana fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
guacamole deploy(bp-guacamole): bump bootstrap-kit pin -> 0.1.27 + blueprint.yaml lockstep (auto, Refs TBD-A6 + TBD-A20, retry 1) 2026-05-20 06:06:07 +00:00
harbor deploy(bp-harbor): bump bootstrap-kit pin -> 1.2.19 + blueprint.yaml lockstep (auto, Refs TBD-A6 + TBD-A20, retry 2) 2026-05-19 04:03:38 +00:00
hcloud-ccm fix(infra): hcloud-CCM + cilium DNS hardening + chart-side gitea token — qa-loop iter-12 Fix #54 (#1281) 2026-05-10 11:56:50 +04:00
hcloud-csi fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
iceberg docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
k8s-ws-proxy deploy(bp-k8s-ws-proxy): bump bootstrap-kit pin -> 0.1.12 + blueprint.yaml lockstep (auto, Refs TBD-A6 + TBD-A20, retry 1) 2026-05-20 06:06:34 +00:00
keda docs(pass-10): banners on 7 more components + opentofu active-active drift fix 2026-04-27 21:43:45 +02:00
keycloak fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
knative feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
kserve feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
kyverno feat(security/kyverno): split policies into bp-kyverno-policies@1.0.0 Blueprint (Refs #2019) (#2022) 2026-05-20 04:42:29 +04:00
kyverno-policies fix(security/kyverno-policies): annotate chart catalyst.openova.io/no-upstream=true (#2023) 2026-05-20 04:48:10 +04:00
langfuse fix(bp-langfuse): drop apostrophe from description to clear GHCR 500 (resolves #215) (#278) 2026-04-30 17:31:51 +04:00
librechat feat(charts): bp-librechat wrapper chart (closes #275) (#287) 2026-04-30 18:56:59 +04:00
litmus feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216) 2026-04-30 06:07:38 +02:00
livekit feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289) 2026-04-30 19:37:28 +04:00
llm-gateway feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
loki feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
matrix deploy(bp-matrix): lockstep blueprint.yaml spec.version -> 1.0.1 (auto, Refs TBD-A20, retry 1) 2026-05-19 04:03:36 +00:00
milvus docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
mimir fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
nats-jetstream feat(bp-nats-jetstream): land Stream + KV CR templates (slice H4, #1095) (#1114) 2026-05-08 22:32:54 +04:00
nemo-guardrails feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
neo4j docs(pass-12): role-in-Catalyst banners on 11 AI/ML Application Blueprints 2026-04-27 21:47:45 +02:00
netbird fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
network-policies feat(bp-network-policies): land default-deny CCNP + system-namespace + DNS allow templates (slice H8, #1095) (#1116) 2026-05-08 22:40:30 +04:00
newapi deploy(bp-newapi): bump bootstrap-kit pin -> 1.4.35 + blueprint.yaml lockstep (auto, Refs TBD-A6 + TBD-A20, retry 2) 2026-05-20 06:06:13 +00:00
openbao fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
openclaw fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
openmeter deploy(bp-openmeter): lockstep blueprint.yaml spec.version -> 1.0.1 (auto, Refs TBD-A20, retry 1) 2026-05-19 04:03:39 +00:00
openova-flow-emitter/chart chore(deploy): bump openova-flow-adapter-flux image to 257291e [skip ci] 2026-05-20 06:07:08 +00:00
openova-flow-server/chart chore(deploy): bump openova-flow-server image to 257291e [skip ci] 2026-05-20 06:05:56 +00:00
opensearch docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
opentelemetry feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
opentelemetry-operator feat(bp-opentelemetry-operator): scaffold operator + default Instrumentation CR (slice H5, #1095) (#1121) 2026-05-08 23:06:29 +04:00
opentofu refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
powerdns fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
qa-app fix(bp-qa-app): annotate no-upstream to satisfy hollow-chart guard (#1261) 2026-05-10 04:51:13 +04:00
reflector/chart fix: bp-reflector + rename ghcr-pull-secret->ghcr-pull (Closes #543) (#554) 2026-05-02 12:17:51 +04:00
reloader fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (closes #715) (#716) 2026-05-04 02:04:26 +04:00
sandbox/chart deploy: bump sandbox-controller image to 257291e 2026-05-20 06:07:53 +00:00
sealed-secrets fix(blueprints): align blueprint.yaml spec.version with Chart.yaml version (#817) (#819) 2026-05-04 22:32:49 +04:00
seaweedfs fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
self-sovereign-cutover feat(self-sovereign-cutover): step 11 — pivot Crossplane Provider CRs to Sovereign Harbor xpkg mirror (#2045) 2026-05-20 06:51:12 +04:00
sigstore feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216) 2026-04-30 06:07:38 +02:00
spire fix(blueprints): align blueprint.yaml spec.version with Chart.yaml version (#817) (#819) 2026-05-04 22:32:49 +04:00
stalwart docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
stalwart-sovereign feat(bp-stalwart-sovereign): per-Sovereign Stalwart for Console mail (#924) (#931) 2026-05-05 14:20:16 +04:00
stalwart-tenant fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
strimzi docs(pass-35): completion sweep for surviving DNS placeholders (8 components) 2026-04-27 22:46:16 +02:00
stunner feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
syft-grype fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220) 2026-04-30 06:15:28 +02:00
tempo feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
temporal feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
trivy fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
valkey fix(bp-valkey): default auth.enabled=false to match bp-newapi passwordless REDIS_CONN_STRING (Closes #2003) (#2007) 2026-05-20 02:26:56 +04:00
velero fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
vllm feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
vpa fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
wordpress-tenant fix(bp-cnpg-pair): switch to synchronous replication (remote_apply) for Pillar 3 zero-tx-loss (Refs #2064) (#2071) 2026-05-20 10:10:49 +04:00