feat(bootstrap-kit): wire bp-continuum (failover orchestrator) — Pillar 3 unblock (Refs #2065) (#2072)
* feat(bootstrap-kit): wire bp-continuum (failover orchestrator) — Pillar 3 unblock
Adds bootstrap-kit slot 62 (62-bp-continuum.yaml) so the Continuum DR
controller actually deploys on a fresh Sovereign. Without this slot the
chart at products/continuum/chart/ sat in-tree with no install path —
catalyst-platform's QA fixtures (slot 13 qa-continuum-status-seed-job)
reference a Continuum CR named `cont-omantel` that no controller was
ever spinning up to reconcile, leaving Pillar-3 unverifiable end-to-end.
Pillar-3 of the canonical end-user DoD ("multi-region BCP — region kill
zero-data-loss failover") requires three pieces:
1. bp-cnpg-pair (Pillar-3 follow-up #2068) — primary + replica CNPG
with ReplicaCluster sync over Cilium ClusterMesh on the WG-public-
IP DMZ data plane.
2. Continuum CR + the per-app HTTPRoute drain hook (follow-up #2066).
3. THIS controller — without bp-continuum deployed, every Continuum
CR sits unhandled and the lua-record flip never fires, so a
region-kill produces TXN-loss on every transaction in-flight.
This PR ships piece 3 — the controller itself, gated default-OFF.
Files
- NEW clusters/_template/bootstrap-kit/62-bp-continuum.yaml — HelmRepository
+ HelmRelease pinned to bp-continuum 0.1.1, targetNamespace
catalyst-system, dependsOn [bp-catalyst-platform, bp-nats-jetstream,
bp-powerdns], default-OFF gate via ${CONTINUUM_ENABLED:-false}.
- UPDATE clusters/_template/bootstrap-kit/kustomization.yaml — slot 62
appended after slot 60 (bp-vcluster-helmrepo), with a header comment
explaining the Pillar-3 dependency analysis.
- UPDATE scripts/expected-bootstrap-deps.yaml — slot 62 declared with the
same dep set so scripts/check-bootstrap-deps.sh stays drift-free.
- UPDATE products/continuum/chart/Chart.yaml — version 0.1.0 → 0.1.1
(first PUBLISHED version; the previous 0.1.0 sat in-tree but blueprint-
release.yaml never pushed it to GHCR for lack of a path-change trigger)
+ add `catalyst.openova.io/smoke-render-mode: default-off` annotation
required by blueprint-release's smoke-render gate for default-OFF charts.
Default-OFF rationale
The chart's own values.yaml ships `continuum.enabled: false` (chart
fail-fasts on empty `image.tag` when enabled=true — Inviolable
Principle #4a no-`:latest` guard). We surface a CONTINUUM_ENABLED
envsubst placeholder so per-Sovereign overlays may flip the gate on
once bp-cnpg-pair + bp-powerdns + lease witness are ready. Default
`false` matches the MARKETPLACE_ENABLED / SANDBOX_ENABLED knob shape.
Why dependsOn does NOT include bp-cnpg-pair
The chart ships default-OFF — the controller installs idle and only
exercises bp-cnpg-pair when an operator flips `continuum.enabled=true`.
Adding bp-cnpg-pair to dependsOn today would break the install on every
Sovereign that hasn't shipped #2068 yet. Per-Sovereign cnpg-pair
provisioning is the gating dependency at flip-time, not install-time.
Validation (Principle #15 — fresh state, NOT --dry-run=server)
- `helm package products/continuum/chart` → bp-continuum-0.1.1.tgz
- `helm template smoke products/continuum/chart` → empty (default-OFF,
matches smoke-render-mode annotation contract).
- `helm template smoke products/continuum/chart --set
continuum.enabled=true` → 6 resources rendered cleanly (Deployment,
Service, ServiceAccount, RBAC, NetworkPolicy).
- `bash scripts/check-bootstrap-deps.sh` → "Drift: 0 Cycles: 0 PASSED".
- `bash scripts/check-bootstrap-kit-pin-sync.sh` → "bp-continuum:
chart=0.1.1 pin=0.1.1 PASS".
- `kubectl kustomize clusters/_template/bootstrap-kit/` → 52 HelmReleases
rendered (was 51 + bp-continuum), `kubectl apply --dry-run=client` on
the rendered YAML produces no errors for bp-continuum.
GHCR publication path
bp-continuum:0.1.0 was never published — git history shows the chart
committed in-tree but the blueprint-release workflow (which triggers on
`products/*/chart/**` diffs) had no path-change to detect since the
initial commit. Bumping Chart.yaml to 0.1.1 forces a fresh publish on
this PR's merge; the auto-bump-pin hook (TBD-A6) then converges the
slot pin via a no-op (already matches at 0.1.1).
Verified bp-continuum:0.1.1 will publish via blueprint-release.yaml's
detect step (`git diff HEAD~1 HEAD | grep -E
'^(platform|products)/[^/]+/(chart/|blueprint.yaml)'`) which catches
products/continuum/chart/Chart.yaml in this commit's diff.
Refs #2065
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(continuum): bump blueprint.yaml spec.version 0.1.0 → 0.1.1 (lockstep)
TestBootstrapKit_BlueprintVersionLockstepSweep enforces
Chart.yaml.version == blueprint.yaml.spec.version for every
bootstrap-kit blueprint. Previous commit bumped Chart.yaml but missed
the blueprint manifest — this commit closes the lockstep.
Same Refs #2065 thread.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hati.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
7b31736482
commit
53f510b983
154
clusters/_template/bootstrap-kit/62-bp-continuum.yaml
Normal file
154
clusters/_template/bootstrap-kit/62-bp-continuum.yaml
Normal file
@ -0,0 +1,154 @@
|
||||
# bp-continuum — Catalyst bootstrap-kit Blueprint slot 62
|
||||
# (Customer-facing capability / DR orchestration).
|
||||
#
|
||||
# OpenOva Continuum — Disaster-Recovery orchestrator for active-hot-
|
||||
# standby Applications (EPIC-6, slice K-Cont-1 #1101 onward). Reconciles
|
||||
# Continuum.dr.openova.io/v1 CRs; per-Continuum-CR goroutine maintains a
|
||||
# lease (10s renew, 30s TTL), watches CNPG replication metrics, and
|
||||
# executes the switchover sequence on lease loss + replication health
|
||||
# drop (drain HTTPRoute → flip lua-record on pool-domain-manager →
|
||||
# flip CNPG primary via bp-cnpg-pair → audit on NATS).
|
||||
#
|
||||
# ─── Pillar-3 unblock (#2065, TBD-V14) ─────────────────────────────────
|
||||
# Pillar-3 of the canonical end-user DoD ("multi-region BCP — region kill
|
||||
# zero-data-loss failover") requires THREE pieces:
|
||||
# 1. bp-cnpg-pair (C-DB-1) — primary + replica CNPG with ReplicaCluster
|
||||
# sync over Cilium ClusterMesh on the WG-public-IP DMZ data plane.
|
||||
# 2. Continuum CR + the per-app HTTPRoute drain hook.
|
||||
# 3. THIS controller — without bp-continuum deployed, every Continuum
|
||||
# CR sits unhandled and the lua-record flip never fires, so a
|
||||
# region-kill produces TXN-loss on every transaction in-flight.
|
||||
#
|
||||
# Before this slot, the chart existed at products/continuum/chart/ and
|
||||
# the controller image was built by .github/workflows/build-continuum-
|
||||
# controller.yaml + SHA-pinned in values.yaml — but no bootstrap-kit
|
||||
# slot deployed it on a fresh Sovereign. catalyst-platform's QA fixtures
|
||||
# (slot 13, `qa-continuum-status-seed-job`) reference a Continuum CR
|
||||
# named `cont-omantel` that no controller is ever spinning up to
|
||||
# reconcile. This slot closes the loop.
|
||||
#
|
||||
# ─── Default-OFF gate ──────────────────────────────────────────────────
|
||||
# The chart's own values.yaml ships `continuum.enabled: false` (chart
|
||||
# fail-fasts on empty `image.tag` when enabled=true — Inviolable
|
||||
# Principle #4a no-`:latest` guard). We surface a CONTINUUM_ENABLED
|
||||
# envsubst placeholder so per-Sovereign overlays may flip the gate on
|
||||
# once bp-cnpg-pair + bp-powerdns + lease witness are ready. Default
|
||||
# `false` so a zero-touch provision lands a non-Continuum Sovereign
|
||||
# (matches the MARKETPLACE_ENABLED / SANDBOX_ENABLED knob shape).
|
||||
#
|
||||
# ─── Placement ─────────────────────────────────────────────────────────
|
||||
# Continuum is itself a single-region controller — it lives on the
|
||||
# MANAGEMENT cluster (per docs/EPICS-1-6-unified-design.md §9 + the
|
||||
# chart's blueprint.yaml placementSchema: modes=[single-region]) and
|
||||
# observes data-plane regions over Cilium ClusterMesh + the witness.
|
||||
# The Application CRs it reconciles are active-hot-standby; the
|
||||
# controller itself is single-region.
|
||||
#
|
||||
# ─── dependsOn ─────────────────────────────────────────────────────────
|
||||
# - bp-catalyst-platform (slot 13) — owns the
|
||||
# `dr.openova.io/v1.Continuum` CRD that the controller watches.
|
||||
# Without this edge, Helm render-time Capabilities gate fails the
|
||||
# install (no matches for kind "Continuum"). NB: CRD lives at
|
||||
# products/catalyst/chart/crds/continuum.yaml.
|
||||
# - bp-nats-jetstream (slot 7) — catalyst.audit publish target the
|
||||
# controller emits switchover audit events to.
|
||||
# - bp-powerdns (slot 11) — the pool-domain-manager Service that
|
||||
# fronts PowerDNS is what the controller POSTs lua-record commits
|
||||
# to during the flip step of the switchover sequence.
|
||||
#
|
||||
# bp-cnpg-pair is intentionally NOT in dependsOn because the chart ships
|
||||
# default-OFF — the controller installs and waits idle until a per-
|
||||
# Sovereign overlay flips `continuum.enabled=true`. Operators must
|
||||
# install bp-cnpg-pair (Pillar 3 audit follow-up #2068) AND configure
|
||||
# the lease witness BEFORE flipping the gate.
|
||||
#
|
||||
# Wrapper chart: products/continuum/chart/
|
||||
# Catalyst-curated values: products/continuum/chart/values.yaml
|
||||
# Reconciled by: Flux on the new Sovereign's k3s control plane.
|
||||
|
||||
---
|
||||
apiVersion: source.toolkit.fluxcd.io/v1beta2
|
||||
kind: HelmRepository
|
||||
metadata:
|
||||
name: bp-continuum
|
||||
namespace: flux-system
|
||||
spec:
|
||||
type: oci
|
||||
interval: 15m
|
||||
url: oci://ghcr.io/openova-io
|
||||
secretRef:
|
||||
name: ghcr-pull
|
||||
---
|
||||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||
kind: HelmRelease
|
||||
metadata:
|
||||
name: bp-continuum
|
||||
namespace: flux-system
|
||||
labels:
|
||||
catalyst.openova.io/slot: "62"
|
||||
catalyst.openova.io/component: continuum-controller
|
||||
openova.io/category: customer-facing-capability
|
||||
openova.io/epic: "6"
|
||||
spec:
|
||||
interval: 15m
|
||||
releaseName: continuum
|
||||
# targetNamespace = catalyst-system to colocate with the other
|
||||
# catalyst-platform controllers (per slot 13 convention). The chart
|
||||
# uses .Release.Namespace for every templated resource.
|
||||
targetNamespace: catalyst-system
|
||||
dependsOn:
|
||||
- name: bp-catalyst-platform
|
||||
- name: bp-nats-jetstream
|
||||
- name: bp-powerdns
|
||||
chart:
|
||||
spec:
|
||||
chart: bp-continuum
|
||||
# 0.1.1 — first published version. 0.1.0 was never pushed to GHCR
|
||||
# despite Chart.yaml claiming so; the chart sat in-tree without a
|
||||
# bootstrap-kit slot to pin it, so blueprint-release.yaml never
|
||||
# bumped past the initial commit's no-op detect step. Bumping to
|
||||
# 0.1.1 in the same PR as this slot forces a fresh publish and
|
||||
# the auto-bump-pin hook (TBD-A6) lands the matching pin write.
|
||||
version: 0.1.1
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-continuum
|
||||
namespace: flux-system
|
||||
install:
|
||||
timeout: 10m
|
||||
disableWait: true
|
||||
remediation:
|
||||
retries: 3
|
||||
upgrade:
|
||||
timeout: 10m
|
||||
disableWait: true
|
||||
remediation:
|
||||
retries: 3
|
||||
# Per-Sovereign overlay surface.
|
||||
#
|
||||
# enabled — default-OFF via ${CONTINUUM_ENABLED:-false} on the
|
||||
# bootstrap-kit Kustomization substitute. Flip true on a per-
|
||||
# Sovereign overlay's substitute map ONCE the operator has:
|
||||
# - bp-cnpg-pair installed (Pillar-3 follow-up #2068 — primary +
|
||||
# replica CNPG cluster with ReplicaCluster sync over ClusterMesh)
|
||||
# - bp-powerdns + pool-domain-manager reachable (lua-record commits)
|
||||
# - lease witness configured (Cloudflare KV per K-Cont-3, or DNS
|
||||
# quorum fallback)
|
||||
# The chart's own `continuum.enabled: false` default is the
|
||||
# defence-in-depth backstop — a stale per-Sovereign overlay that
|
||||
# hand-installs the HR without our envsubst layer still default-OFFs
|
||||
# gracefully.
|
||||
#
|
||||
# Image tag — NOT overridden here. The chart's values.yaml carries
|
||||
# the canonical SHA-pinned `continuum.image.tag` (auto-bumped on every
|
||||
# push to main by .github/workflows/build-continuum-controller.yaml).
|
||||
# Day-2 SHA pivots remain available via per-Sovereign overlay patches
|
||||
# at spec.values.continuum.image.tag.
|
||||
#
|
||||
# pdmURL / natsURL — empty defaults route through the in-cluster
|
||||
# Service DNS (pool-domain-manager.catalyst-system.svc.cluster.local
|
||||
# + nats.openova-system.svc.cluster.local respectively). Per-
|
||||
# Sovereign overlays may repoint at Sovereign-local instances.
|
||||
values:
|
||||
continuum:
|
||||
enabled: ${CONTINUUM_ENABLED:-false}
|
||||
@ -157,6 +157,16 @@ resources:
|
||||
# slot-19a comment block + 19a-bp-sandbox.yaml header for full
|
||||
# diagnostic chain. No functional difference for operators — the
|
||||
# SANDBOX_ENABLED knob still gates rendering identically.
|
||||
# bp-continuum (slot 62) — Pillar-3 unblock (#2065, TBD-V14). DR
|
||||
# orchestrator for active-hot-standby Applications. Reconciles
|
||||
# Continuum.dr.openova.io/v1 CRs; executes switchover sequence
|
||||
# (drain HTTPRoute → flip lua-record → flip CNPG primary → audit on
|
||||
# NATS). Default-OFF via ${CONTINUUM_ENABLED:-false}; operators flip
|
||||
# on once bp-cnpg-pair + lease witness are configured. See slot-62
|
||||
# header comment for full Pillar-3 dependency analysis. Sequenced past
|
||||
# the vCluster cohort (slots 54/58/59/60) so its `bp-catalyst-platform`
|
||||
# dep + Continuum CRD ordering converge before the controller starts.
|
||||
- 62-bp-continuum.yaml
|
||||
# bp-newapi (slot 80) — multi-tenant LLM marketplace gateway. Sequenced
|
||||
# after the W2.K1 dependency wave (cnpg/keycloak/openbao Ready) so
|
||||
# NewAPI's ExternalSecret + DSN dependencies resolve on first reconcile.
|
||||
|
||||
@ -7,7 +7,15 @@ description: |
|
||||
switchover sequence). Slice K-Cont-1 of EPIC-6 (#1101) ships the
|
||||
product skeleton; K-Cont-2 fills the reconcile loop.
|
||||
type: application
|
||||
version: 0.1.0
|
||||
# 0.1.1 (Pillar-3 unblock #2065, 2026-05-20): first PUBLISHED version.
|
||||
# 0.1.0 sat in-tree without a bootstrap-kit slot to pin it, so the
|
||||
# blueprint-release workflow's `detect changed paths` step never had
|
||||
# reason to re-run and the chart was never pushed to GHCR. Bumping the
|
||||
# pin in lockstep with the new slot file (clusters/_template/bootstrap-
|
||||
# kit/62-bp-continuum.yaml) makes blueprint-release publish the chart
|
||||
# on this PR's merge; the auto-bump-pin hook (TBD-A6) then converges
|
||||
# the slot pin via a no-op (already matches).
|
||||
version: 0.1.1
|
||||
appVersion: "0.1.0"
|
||||
home: https://openova.io
|
||||
sources:
|
||||
@ -28,3 +36,16 @@ annotations:
|
||||
openova.io/category: customer-facing-capability
|
||||
openova.io/epic: "6"
|
||||
openova.io/depends-on: bp-cnpg-pair,bp-powerdns,pdm
|
||||
# smoke-render-mode: default-off — bp-continuum's chart ships
|
||||
# `continuum.enabled: false` as its default; helm template with
|
||||
# default values legitimately renders zero resources (per chart
|
||||
# README "the gate keeps the controller stopped until the operator
|
||||
# installs bp-cnpg-pair + bp-powerdns and configures the witness").
|
||||
# Without this annotation the blueprint-release.yaml smoke gate
|
||||
# (`<5 lines = empty render`) fails publish. The enabled-render path
|
||||
# is exercised at install time by the bootstrap-kit slot's per-
|
||||
# Sovereign CONTINUUM_ENABLED flip and by the chart's own
|
||||
# templates/* unit tests (default-off backstop covered by
|
||||
# blueprint-release's auto-template step at lines 326-358 of the
|
||||
# workflow).
|
||||
catalyst.openova.io/smoke-render-mode: default-off
|
||||
|
||||
@ -6,7 +6,7 @@ metadata:
|
||||
catalyst.openova.io/section: pts-9-disaster-recovery
|
||||
openova.io/category: customer-facing-capability
|
||||
spec:
|
||||
version: 0.1.0
|
||||
version: 0.1.1
|
||||
card:
|
||||
title: Continuum
|
||||
summary: |
|
||||
|
||||
@ -510,6 +510,31 @@ slots:
|
||||
- bp-harbor
|
||||
wave: present
|
||||
|
||||
# ---- Slot 62 — bp-continuum DR orchestrator (Pillar-3 unblock).
|
||||
# Issue #2065 (TBD-V14, 2026-05-20). Reconciles Continuum.dr.openova.io/v1
|
||||
# CRs and executes the switchover sequence on lease loss + replication
|
||||
# health drop (drain HTTPRoute → flip lua-record → flip CNPG primary →
|
||||
# audit on NATS). Default-OFF gate via ${CONTINUUM_ENABLED:-false} on
|
||||
# the bootstrap-kit Kustomization substitute; operators flip on once
|
||||
# bp-cnpg-pair + lease witness are configured.
|
||||
#
|
||||
# dependsOn:
|
||||
# - bp-catalyst-platform (slot 13) — owns the `dr.openova.io/v1.Continuum`
|
||||
# CRD that the controller watches. Without this edge, Helm render-time
|
||||
# Capabilities gate fails the install (no matches for kind "Continuum").
|
||||
# - bp-nats-jetstream (slot 7) — catalyst.audit publish target the
|
||||
# controller emits switchover audit events to.
|
||||
# - bp-powerdns (slot 11) — pool-domain-manager fronts PowerDNS; the
|
||||
# controller POSTs lua-record commits during the flip step.
|
||||
#
|
||||
# bp-cnpg-pair (Pillar-3 follow-up #2068) is intentionally NOT in
|
||||
# dependsOn — chart ships default-OFF so the controller installs and
|
||||
# waits idle until operators flip the gate after wiring bp-cnpg-pair.
|
||||
- slot: 62
|
||||
name: bp-continuum
|
||||
depends_on: [bp-catalyst-platform, bp-nats-jetstream, bp-powerdns]
|
||||
wave: present
|
||||
|
||||
# ---- Slot 80 — bp-newapi multi-tenant LLM marketplace gateway. Issue #799.
|
||||
# Sequenced past the W2.K4 numbering plan (slots 36-48) so it never
|
||||
# collides with the AI-runtime / observability / livekit cohort. The
|
||||
|
||||
Loading…
Reference in New Issue
Block a user