* docs(arch): consolidate ARCHITECTURE + PLATFORM-TECH-STACK + NAMING + EPICS-1-6 + BOOTSTRAP-KIT-EXPANSION → docs/ARCHITECTURE.md (lean doc strategy) Single canonical "how OpenOva works" doc per founder's lean-doc strategy. 2926 source lines → 1110 consolidated lines, no semantic loss. Sections: §1 High-level model (Catalyst/Sovereign/Org/Env/Application/Blueprint) §2 Repo layout §3 Tech stack by layer (CNI/GitOps/IaC/event-spine/data/secrets/identity/...) §4 Naming conventions (dimensions, patterns, labels, DOMAINS-CANON) §5 Catalyst control plane (rules, CRDs, controllers, cutover, identity, surfaces) §6 Per-host-cluster infrastructure §7 Application Blueprints §8 Multi-region topology (1 cpx52/region, WireGuard-over-public-IPs, ClusterMesh) §9 Bootstrap-kit slot ordering (full 48-slot canonical list) §10 EPIC-level design overview (EPIC-0 through EPIC-6) §11 Per-chart DESIGN.md inventory §12 OAM influence §13 Read further Stale literal fixes: - omantel.openova.io → omantel.biz / <sovereign>.<tld> / t38.omani.works (7 instances) - SPIRE marked DEFERRED / opt-in only (PR #665, TBD-V29 #2055) - failover-controller marked REPLACED by bp-continuum New PR refs wired into §3: - PR #665 SPIRE deferral - PR #2071 bp-cnpg-pair synchronous remote_apply (zero-tx-loss multi-region) - PR #2087 bp-cnpg-pair pre-merge guard - PR #2093 bp-cnpg-pair pre-merge guard New stack components added to §3: - bp-cnpg-pair (synchronous remote_apply ReplicaCluster across ClusterMesh) - bp-continuum (lease-based failover orchestrator) - bp-self-sovereign-cutover (8-tether pivot, ADR-0002, Principle #11) Source docs (to be deleted by orchestrator in final PR): - docs/PLATFORM-TECH-STACK.md - docs/NAMING-CONVENTION.md - docs/EPICS-1-6-unified-design.md - docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md * docs(principles): consolidate INVIOLABLE-PRINCIPLES + ANTI-PATTERN-CATALOG → docs/PRINCIPLES.md (lean doc strategy) * docs(dod): consolidate 5-PILLAR-DOD + DOMAINS-CANON + SOVEREIGN-MULTI-REGION-DOD + PERSONAS-AND-JOURNEYS → docs/DOD.md (lean doc strategy) * docs(runbooks+status+glossary): consolidate 5 runbooks → RUNBOOKS.md + refresh STATUS.md + fold banned-terms into GLOSSARY.md (lean doc strategy) Part 1 — Runbook consolidation: - NEW docs/RUNBOOKS.md with 7 numbered sections (provisioning, day-2 ops, Blueprint authoring, chart conventions, demo walk, failover, troubleshooting) - Folds BLUEPRINT-AUTHORING / CHART-AUTHORING / DEMO-RUNBOOK / RUNBOOK-OPERATIONS / RUNBOOK-PROVISIONING into one canonical surface - Documents dual-annotation requirement for charts with enabled.default: false (GUARD 1 #2087 no-upstream + GUARD 2 #2093 smoke-render) with bp-network-policies:1.0.1 dead-reserve incident as the live evidence - All admin.<fqdn> legacy URL refs → console.<fqdn>/bss (BSS lives in operator console) - All openova.io / omantel.omani.works test commands → canonical t<NN>.omani.works - Cites PRs #2076 (docs migration), #2082 (no-auto-close-keyword), #2087, #2093 Part 2 — STATUS.md refresh (renamed from IMPLEMENTATION-STATUS.md): - Header dated 2026-05-20 (was 2026-04-29; 22 days stale per audit) - Adds 🟦 CODE-COMPLETE state for "controllers + CRDs + tests landed, awaiting fresh-prov walk" (per 5-pillar DoD) - Pillar 3 marked CODE-COMPLETE (PRs #2071/#2072/#2073/#2074/#2075/#2053) - Adds 3 new CRDs verified in products/catalyst/chart/crds/: CNPGPair, PDM, Sandbox - Sandbox controller chain CODE-COMPLETE (PRs #1615/#1618/#1621/#1622/#1626/#1631/#1632) - SPIRE marked DEFERRED — opt-in only (PRs #665, #2056, #2061) - New §6 CI / supply-chain guards table: hollow-chart (#2087), smoke-render (#2093), no-auto-close-keyword (#2082), observability-toggle, subchart 4-step, Flux version-pin replay - New §9 Pillar-status table — Pillars 1/2/3/4 CODE-COMPLETE, Pillar 5 🚧 - Pillar 1 (PRs #2038 V18, #2043 V18-D), Pillar 2 (PR #2029 V20), Pillar 3 (per above), Pillar 4 (Sandbox chain) Part 3 — GLOSSARY.md folded as single source of truth for banned terms: - Header dated 2026-05-20, notes "single source of truth for banned terms" and "no separate BANNED-TERMS.md" - Existing 11 banned-terms rows rewritten with italicized qualifiers - NEW Forbidden test domains subsection: openova.io (mothership-only), omantel.openova.io (hallucinated), Nova Cloud (predecessor brand), eventforge.io (hallucinated), admin.<fqdn> (dead BSS URL) - SPIFFE/SPIRE identity row + acronym row marked deferred per PR #665 with TBD-V29 (#2055) re-introduction roadmap - Cross-links updated: IMPLEMENTATION-STATUS → STATUS, SOVEREIGN-PROVISIONING + BLUEPRINT-AUTHORING → RUNBOOKS.md CLAUDE.md NOT touched. Source files NOT deleted (orchestrator owns deletion). No push, no PR. Manifest at /tmp/merge-D-runbooks-status-glossary-manifest.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: assemble lean doc strategy — delete legacy sources, move ledger/sessions/archive, ADR-0004, rewrite cross-refs Per founder direction 2026-05-20 + user-global ~/.claude/CLAUDE.md §11. This is the orchestrator commit on top of the four cherry-picked consolidation commits (ARCHITECTURE, PRINCIPLES, DOD, RUNBOOKS+STATUS+GLOSSARY). It: 1. Deletes 15 legacy source docs (now folded into the 7 canonical): PLATFORM-TECH-STACK, NAMING-CONVENTION, EPICS-1-6-unified-design, BOOTSTRAP-KIT-EXPANSION-PLAN, INVIOLABLE-PRINCIPLES, ANTI-PATTERN-CATALOG, 5-PILLAR-DOD, DOMAINS-CANON, SOVEREIGN-MULTI-REGION-DOD, PERSONAS-AND-JOURNEYS, BLUEPRINT-AUTHORING, CHART-AUTHORING, DEMO-RUNBOOK, RUNBOOK-OPERATIONS, RUNBOOK-PROVISIONING. 2. Moves transient + historical docs into proper subdirs: - docs/ledger/{TRUST,TRACKER}.md (cron-refreshed live state) - docs/sessions/{2026-05-17-convergence,2026-05-19-20-trust-recovery, 2026-05-20-trust-audit,2026-05-20-walk-runbook}.md - docs/archive/{validation-log,orchestrator-state,omantel-handover-wbs}.md 3. Adds docs/adr/0004-cnpg-sync-replication.md (Pillar 3 zero-tx-loss decision) + docs/adr/README.md index. 4. Updates CLAUDE.md reading-order + repo-structure block to match the lean strategy and current core/ tree (controllers/, marketplace/, etc.). 5. Sweeps all .md files + .github/workflows + scripts to repoint old doc paths to the new canonical homes. ADR cross-references kept intact (ADRs are immutable historical artifacts). Operator-side cron scripts that still write to the old paths (/home/openova/bin/refresh-dod-dashboard.sh, refresh-wbs.sh and openova-private/bin/trust-audit.sh) need a one-line path update — flagged in the PR body. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(bootstrap-kit): update repo-root sentinel to docs/PRINCIPLES.md The bootstrap-kit Go test used `docs/INVIOLABLE-PRINCIPLES.md` as its repo-root sentinel; the file no longer exists after the lean-doc consolidation (it's now `docs/PRINCIPLES.md`). Update the walker to match the new canonical filename. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
13 KiB
Multi-Region DNS — health-checked failover with PowerDNS lua-records
Status: Authoritative. Updated: 2026-04-29 (Reconcile Pass 1).
This document is the canonical reference for how Catalyst routes traffic across regions. Geographic redundancy in OpenOva is realized at the authoritative DNS layer, not at the K8s controller layer. PowerDNS lua-records (ifurlup, ifportup, pickclosest, pickrandom, pickwhashed) provide everything Catalyst needs:
- Geo-aware response selection — answer the closest healthy backend for the resolver's source IP / ECS subnet.
- Health-checked failover — drop a backend from the response set when a TCP/HTTP probe fails, restore it when the probe recovers.
- Latency-aware routing — combine
ifurlup(health) withpickclosest(geo) for active-active steering. - Same operational layer Catalyst already runs — PowerDNS is bp-powerdns, deployed by the bootstrap kit on every Sovereign's
mgtcluster. No separate operator, no extra CRDs, no extra reconciliation loop.
This subsumes the role previously assigned to k8gb. The k8gb component has been removed from componentGroups.ts, the umbrella chart, and the wizard; lua-records cover every failover scenario k8gb covered without the dedicated GSLB controller.
1. Why PowerDNS lua-records (and why not k8gb)
| Concern | k8gb (removed) | PowerDNS lua-records (current) |
|---|---|---|
| Authoritative DNS | CoreDNS plugin, separate zone | PowerDNS authoritative — same zones used for external-dns, ACME, etc. |
| Operator footprint | k8gb controller + CRDs (Gslb, GslbHttpRoute) + per-cluster CoreDNS pod set |
None — declarative LUA records in the existing PowerDNS zone |
| Health-check primitive | k8gb-managed liveness probes | PowerDNS ifurlup / ifportup (HTTP / TCP probes from PowerDNS pods) |
| Geo selection | EdgeDNS witness + custom logic | pickclosest (geo by source IP), pickrandom (RR), pickwhashed (sticky weighted) |
| DNSSEC | Layered on top, separate signer | Native — PowerDNS signs the lua-record's computed answer with the zone's KSK/ZSK |
| Operational surface | k8gb pods + CoreDNS pods + custom CRDs | Existing PowerDNS deployment + dnsdist rate-limit shield |
| Cluster-coordination | Required (gslb endpoints sync between clusters) | Not required — authoritative DNS is the source of truth |
The architectural cost difference is large enough that the deletion is the right move per PRINCIPLES.md #2 ("never compromise from quality — pick the unified primitive, not the dual-shape design") and #4 ("never hardcode — health probes, weights, geo policy are configuration in the lua-record body, not code in a controller").
2. Failover patterns (the lua-record cookbook)
Every Catalyst Sovereign zone is hosted on PowerDNS. The records below sit alongside ordinary A/AAAA/CNAME records that external-dns writes via the PowerDNS REST API. Lua-record syntax follows the upstream PowerDNS documentation.
Note on examples. Backend IPv4 addresses (
5.161.42.18,95.217.189.42) and the FQDNprimary.example.combelow are placeholders — they illustrate the lua-record shape only. The canonical 6-record set per Sovereign zone is written by pool-domain-manager (PDM,core/pool-domain-manager/) on/v1/commit; lua-records (geo / health-check policy) are written by the catalyst-dns controller (Catalyst control-plane sidecar) from each Application's Placement spec — seedocs/PLATFORM-POWERDNS.md§"In-cluster consumers".
2.1 Active-active across two regions, health-checked
foo.acme.com. IN LUA A "ifurlup('https://primary.example.com/healthz', {'5.161.42.18', '95.217.189.42'}, {selector='all'})"
- PowerDNS HTTP-probes
https://primary.example.com/healthzfrom each PowerDNS pod every 5s (default; configurable viaintervaloption). selector='all'returns every healthy backend — the resolver's stub then picks one (typical client behaviour: rotate, retry on failure).- When the probe to a backend fails three times in a row (default
failOnIncerror=true, 3 fails to drop), that backend is removed from the answer set within the next TTL window. - When the probe recovers, the backend is restored automatically.
2.2 Geo-aware active-active (pickclosest)
api.acme.com. IN LUA A "pickclosest({'5.161.42.18', '95.217.189.42'})"
- PowerDNS uses ECS (EDNS Client Subnet) when present, falling back to the resolver's source IP.
- The closer regional LB by GeoIP wins.
- Combine with
ifurlupfor health-aware closeness:
api.acme.com. IN LUA A "
ifurlup('https://primary.example.com/healthz', {
{'5.161.42.18', '95.217.189.42'}
}, {selector='pickclosest'})
"
2.3 Active-passive (primary → DR)
api.acme.com. IN LUA A "ifurlup('https://primary.example.com/healthz', {'5.161.42.18', '95.217.189.42'}, {selector='pickfirst'})"
pickfirstreturns the first healthy backend in the list.- When
5.161.42.18(primary) is healthy → answer is5.161.42.18. - When primary fails the probe → answer flips to
95.217.189.42(DR) within one TTL window. - When primary recovers → answer flips back to primary on the next probe success.
2.4 TCP-only / non-HTTP services (ifportup)
For services that don't expose an HTTP /healthz (e.g. SMTP, IMAP, custom TCP):
mail.acme.com. IN LUA A "ifportup(587, {'5.161.42.18', '95.217.189.42'})"
- PowerDNS attempts a TCP connect to port 587 on each backend.
- Connect-fail → drop from the response set; connect-success → include.
2.5 Weighted round-robin (pickwhashed)
For canary releases or traffic-shifting:
api.acme.com. IN LUA A "pickwhashed({{80, '5.161.42.18'}, {20, '95.217.189.42'}})"
- 80% of distinct client IPs are pinned to
5.161.42.18, 20% to95.217.189.42(consistent hash on source IP — the same client gets the same answer until the weight changes).
3. Catalyst integration points
3.1 Where lua-records are written
Lua-records are part of each Sovereign's PowerDNS zone, alongside the canonical 6-record set (PLATFORM-POWERDNS.md §"Per-Sovereign zone model"). The 6-record set is written once at provisioning by pool-domain-manager (PDM /v1/commit); ongoing A/AAAA/CNAME records are written by external-dns; LUA records are written by the catalyst-dns controller (sidecar to the Catalyst control plane on the mgt cluster):
PDM ──► PowerDNS REST API ──► canonical 6-record set (one-shot at provision)
external-dns ──► PowerDNS REST API ──► A/AAAA/CNAME records (per-region LB IPs)
catalyst-dns ──► PowerDNS REST API ──► LUA records (geo / health-check policy)
This separation matters: external-dns knows about a single K8s Service or Ingress; it has no concept of multi-region health policy. The catalyst-dns controller reads the Application's Placement field from the per-Org Gitea repo, sees placement: active-active (or active-hotstandby, etc.), and synthesizes the corresponding lua-record body.
3.2 Application Placement → lua-record selector mapping
| Application Placement | lua-record idiom |
|---|---|
single-region |
Plain A record(s) — no lua-record needed |
active-active |
ifurlup(..., {selector='all'}) (or selector='pickclosest' for geo-affinity) |
active-hotstandby |
ifurlup(..., {selector='pickfirst'}) — primary first, DR second |
active-passive-warm |
ifurlup(..., {selector='pickfirst'}) + longer TTL (manual operator promotion is the contract; the LUA only flips when the probe fails enough times) |
weighted-canary |
pickwhashed({{w1, ip1}, {w2, ip2}}) — adjust weights via Catalyst console (re-emits the lua-record body with new weights) |
3.3 Probe target
Every Catalyst Application Blueprint MUST expose /healthz on its public endpoint. The catalyst-dns controller defaults to https://<app-fqdn>/healthz as the probe target, configurable per-Application via spec.healthCheck.path in the Blueprint instance.
DNS pods are inside the Sovereign — they probe outbound to the regional LB IPs over the public internet (or via the Cilium Cluster Mesh + WireGuard back-channel for cross-region private probes). The probe direction is intentional: DNS pods are the source of truth on whether a regional LB is reachable from the same place the public internet would reach it.
3.4 Split-brain protection (failover-controller)
Lua-records are necessary but not sufficient for split-brain protection during a network partition. The failover-controller layers a lease-based witness on top:
- During healthy operation, each regional cluster renews a lease in a cloud witness (Cloudflare KV or similar — out of band from the Sovereign's own infra).
- The PowerDNS lua-record probes are the primary failover signal (sub-minute response).
- The lease becomes the tie-breaker for stateful promotion (OpenBao DR, CNPG primary promotion) — only the cluster holding a valid lease is allowed to take over write authority.
- See
SRE.md§2.4 for the witness protocol; this doc covers only the DNS-routing half.
4. When to add a second Sovereign region (the HA upgrade path)
A single-region Sovereign is the SME default (ARCHITECTURE.md §9.2). For corporate / regulated tier (and for any Sovereign that signs an SLA strict enough that single-region downtime would breach it), the upgrade path is:
- Sovereign provisioned in Region A (e.g.
hz-fsn-rtz-prod) — single LB IP, plain A records. - Operator decides to add Region B via the Catalyst admin UI: Admin → Infrastructure → Add Region (see
SOVEREIGN-PROVISIONING.md§8). - Crossplane provisions Region B's clusters (rtz + dmz) with the same building blocks as Region A.
- Region B's PowerDNS replicas join the Sovereign's authoritative NS set via SOA NOTIFY + AXFR (PowerDNS-native zone replication; no external sync layer needed).
- catalyst-dns rewrites every Application's lua-record from
single-region→active-active(or whichever Placement the Application opts into). Old plain A records are replaced withifurlup(...)lua-records pointing at both regional LBs. - The cloud witness (failover-controller) starts arbitrating leases across the two clusters.
The cluster name never changes during this upgrade — Region A's cluster is still hz-fsn-rtz-prod, Region B is now hz-hel-rtz-prod, and neither is "primary" or "DR". This is the explicit design from ARCHITECTURE.md §1.3 — failover is a routing event, not a renaming event.
4.1 Triggers for adding a second region
| Trigger | Recommendation |
|---|---|
| SLA target ≥ 99.95% uptime | Mandatory second region — single-region cannot meet this |
| Compliance requirement (DORA, NIS2, GDPR data residency split) | Mandatory — typically one region per data-residency boundary |
Application's Placement set to active-active / active-hotstandby / active-passive-warm |
Mandatory — these placements require ≥ 2 regions to honour |
| Latency-sensitive global traffic (regional users far from Region A) | Strongly recommended — pickclosest lua-records cut median RTT |
| Cost-sensitive single-tenant Sovereign on a low-tier SLA | Defer — pay for it when a workload demands it |
5. Operational checks
5.1 Verify a lua-record is healthy
dig +short api.acme.com @ns1.openova.io
# Expected: an A record from the healthy regional LB set.
dig +short api.acme.com @ns1.openova.io \
+subnet=80.81.82.0/24
# Expected: with a EU client subnet, pickclosest returns the EU regional LB.
5.2 Force a probe-failure simulation (chaos-engineering)
The Litmus chaos suite includes a scenario that black-holes a regional LB's probe target. After ~1 TTL window:
dig +short api.acme.com @ns1.openova.io
# Expected: the affected backend IP is absent from the response.
When the probe target is restored, the IP returns automatically — no operator action.
5.3 Read PowerDNS probe state
kubectl exec -n openova-system deploy/powerdns -- pdns_control bind-list-record api.acme.com
PowerDNS exposes the current probe status (last probe timestamp, last result, current selection set) — useful when investigating "why is the answer set what it is?" during an incident.
6. References
- PowerDNS Lua Records — upstream documentation — every selector, every option.
PLATFORM-POWERDNS.md— the bp-powerdns deployment, DNSSEC posture, REST API contract.SOVEREIGN-PROVISIONING.md§7-§8 — multi-region topology + add-region workflow.ARCHITECTURE.md§1.3 + §7 — building-block naming, no "primary"/"DR" labels.SRE.md§2 — multi-region strategy, split-brain protection, data-replication patterns.SECURITY.md§5 — OpenBao independent-Raft-per-region (DNS failover doesn't touch secret authority).- Issue #171 — the change that retired k8gb in favour of PowerDNS lua-records.
Part of OpenOva Catalyst. Read Inviolable Principles before any changes.