Merge pull request #2061 from openova-io/docs-sweep-spire-deferred-followup

docs(sweep): align 6 docs with PR #665 SPIRE deferral + PR #2056 (Refs #2055)
This commit is contained in:
e3mrah 2026-05-20 09:23:19 +04:00 committed by GitHub
commit 02472e58cc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 25 additions and 21 deletions

View File

@ -551,8 +551,8 @@ Every bootstrap-kit Blueprint at v1.1.1+ ships every observability surface defau
| bp-flux | `flux2.prometheus.podMonitor.create` | `false` | monitoring.coreos.com/v1 PodMonitor |
| bp-crossplane | `crossplane.metrics.enabled` | `false` | Upstream emits prometheus.io/scrape annotation only — kept off for uniformity |
| bp-sealed-secrets | `sealed-secrets.metrics.serviceMonitor.enabled` | `false` | monitoring.coreos.com/v1 ServiceMonitor |
| bp-spire | `spire.global.spire.recommendations.enabled` | `false` | Cascades prometheus exporters into spire-server / spire-agent |
| bp-spire | `spire.global.spire.recommendations.prometheus` | `false` | Belt-and-braces inside the recommendations bundle |
| bp-spire (opt-in, deferred — see [#665](https://github.com/openova-io/openova/pull/665)) | `spire.global.spire.recommendations.enabled` | `false` | Cascades prometheus exporters into spire-server / spire-agent. bp-spire was removed from the canonical bootstrap-kit by PR #665; chart retained as opt-in for cross-Sovereign federation + per-Pod-fingerprint authz. Re-introduction roadmap: TBD-V29 ([#2055](https://github.com/openova-io/openova/issues/2055)). |
| bp-spire (opt-in) | `spire.global.spire.recommendations.prometheus` | `false` | Belt-and-braces inside the recommendations bundle |
| bp-nats-jetstream | `nats.promExporter.enabled` | `false` | Sidecar exporter container |
| bp-nats-jetstream | `nats.promExporter.podMonitor.enabled` | `false` | monitoring.coreos.com/v1 PodMonitor |
| bp-openbao | `openbao.injector.metrics.enabled` | `false` | injector metrics endpoint |
@ -618,7 +618,7 @@ The contribution path applies equally to Crossplane Compositions, Helm charts, a
| All container images cosigned | Supply-chain security; Kyverno admission policy denies unsigned. |
| All artifacts SBOMed | Compliance (EU CRA, NIS2). |
| No plaintext secrets in chart values; use ExternalSecret references | See [`SECURITY.md`](SECURITY.md). |
| Workload identity via SPIFFE; no static service-account tokens | See [`SECURITY.md`](SECURITY.md) §2. |
| Workload identity via K8s ServiceAccount TokenReview (projected bound-tokens, audience-scoped, kubelet-rotated hourly) on top of Cilium WireGuard transport encryption; no unbound long-lived service-account secret tokens. SPIFFE/SPIRE was dropped from the bootstrap-kit by founder PR [#665](https://github.com/openova-io/openova/pull/665) (chart retained as opt-in; re-introduction roadmap in TBD-V29 [#2055](https://github.com/openova-io/openova/issues/2055)) | See [`SECURITY.md`](SECURITY.md) §2. |
| Health endpoints standardized: `/healthz` (liveness) + `/readyz` (readiness) | Catalyst observability assumes them. |
| Metrics on `/metrics` (Prometheus exposition) | Catalyst Grafana stack scrapes them. |
| Logs to stdout, structured JSON | Loki ingests them. |

View File

@ -3,6 +3,8 @@
**Status:** Design (no implementation in this PR). **Author:** W1.D. **Updated:** 2026-04-30.
**Authoritative anchors:** [`PLATFORM-TECH-STACK.md`](PLATFORM-TECH-STACK.md), [`PROVISIONING-PLAN.md`](PROVISIONING-PLAN.md), [`ARCHITECTURE.md`](ARCHITECTURE.md) §11, [`BLUEPRINT-AUTHORING.md`](BLUEPRINT-AUTHORING.md).
> **SPIRE deferral (2026-05-03).** This document was drafted with `bp-spire` at slot 06. Founder PR [#665](https://github.com/openova-io/openova/pull/665) ("drop bp-spire — Cilium WireGuard is canonical east-west mesh") subsequently removed slot 06 from `clusters/_template/bootstrap-kit/`; the `platform/spire/` chart is retained as opt-in for future re-introduction. Today's canonical workload identity = Cilium WireGuard (kernel transport encryption) + K8s ServiceAccount TokenReview (workload-to-workload auth via OpenBao `kubernetes` auth method). The tables and DAG diagrams below preserve the pre-deferral slot numbering as a historical record of the Wave-2 dispatch plan; re-enable triggers + re-introduction roadmap live in TBD-V29 ([#2055](https://github.com/openova-io/openova/issues/2055)). The "Max chain length = 7" calculation in §2.8 was bounded by the cilium → cert-manager → spire → openbao chain — with bp-spire removed, the post-PR-665 max chain is one shorter (6 hops).
---
## 0. Purpose & non-goals
@ -32,7 +34,7 @@ The `clusters/_template/bootstrap-kit/` directory currently contains **14 HelmRe
| 03 | `03-flux.yaml` | bp-flux | 0 — Foundation | Host-Flux. (Bootstrap Flux that loaded this kit is replaced.) |
| 04 | `04-crossplane.yaml` | bp-crossplane | 0 — Foundation | Day-2 IaC. Adopts Phase-0 OpenTofu artefacts. |
| 05 | `05-sealed-secrets.yaml` | bp-sealed-secrets | 0 — Foundation | Bootstrap-only; transient until ESO+OpenBao take over. |
| 06 | `06-spire.yaml` | bp-spire | 1 — Identity | SPIFFE root + agent. Workload SVIDs. |
| 06 | `06-spire.yaml` | bp-spire | 1 — Identity | **deferred** — slot 06 removed by founder PR [#665](https://github.com/openova-io/openova/pull/665) (2026-05-03). Originally: SPIFFE root + agent, Workload SVIDs. Today's canonical workload identity = Cilium WireGuard + K8s SA TokenReview; chart retained as opt-in. Re-introduction roadmap: TBD-V29 ([#2055](https://github.com/openova-io/openova/issues/2055)). |
| 07 | `07-nats-jetstream.yaml` | bp-nats-jetstream | 2 — Eventbus | Control-plane event spine. |
| 08 | `08-openbao.yaml` | bp-openbao | 1 — Identity/secret | Per-Sovereign secret backend. Raft. |
| 09 | `09-keycloak.yaml` | bp-keycloak | 1 — Identity | OIDC/OAuth, per-Sovereign or per-Org realms. |
@ -60,7 +62,7 @@ Legend:
| 3 | bp-flux | 0 | host | present (slot 03) | GitOps. |
| 4 | bp-crossplane | 0 | mgt | present (slot 04) | Day-2 IaC. |
| 5 | bp-sealed-secrets | 0 | host (transient) | present (slot 05) | Decommissioned after Phase 1. |
| 6 | bp-spire | 1 | host | present (slot 06) | SVIDs. |
| 6 | bp-spire | 1 | host | **deferred** (was slot 06) | Removed from bootstrap-kit by founder PR [#665](https://github.com/openova-io/openova/pull/665); chart retained as opt-in. Canonical workload identity = Cilium WireGuard + K8s SA TokenReview. Re-introduction roadmap: TBD-V29 ([#2055](https://github.com/openova-io/openova/issues/2055)). |
| 7 | bp-nats-jetstream | 2 | mgt | present (slot 07) | Event spine. |
| 8 | bp-openbao | 1 | mgt | present (slot 08) | Secret backend. |
| 9 | bp-keycloak | 1 | mgt | present (slot 09) | OIDC. |

View File

@ -92,7 +92,7 @@ Click **New Sovereign**. Walk the 7-step wizard (canonical order from `STEPS` in
| 4. Credentials | Hetzner project ID | The numeric project ID from Pre-flight |
| 4. Credentials | SSH public key | Paste the `*.pub` content from Pre-flight |
| 5. Components | Choose Your Stack tab | Single flat marketplace card grid (#162, #b0ec0c43) with family chips + search + product-family chip filter. Recommended families ship default-on; toggle optional ones to taste. Per #175, dependency-aware cascades pull transitive deps automatically (Specter → BGE/Milvus/LangFuse/vLLM/KServe; Harbor → cnpg/seaweedfs/valkey via mandatory closure). |
| 5. Components | Always Included tab | Read-only — bp-cilium, bp-flux, bp-crossplane, bp-cert-manager, bp-spire, bp-nats-jetstream, bp-openbao, bp-keycloak, bp-gitea, bp-sealed-secrets, bp-powerdns, plus the post-promotion mandatory closure (cnpg, valkey). Always installed. |
| 5. Components | Always Included tab | Read-only — bp-cilium, bp-flux, bp-crossplane, bp-cert-manager, bp-nats-jetstream, bp-openbao, bp-keycloak, bp-gitea, bp-sealed-secrets, bp-powerdns, plus the post-promotion mandatory closure (cnpg, valkey). Always installed. (bp-spire was removed from the Always-Included list by founder PR [#665](https://github.com/openova-io/openova/pull/665) — Cilium WireGuard + K8s SA TokenReview are canonical workload identity; chart retained as opt-in. Re-introduction roadmap: TBD-V29 [#2055](https://github.com/openova-io/openova/issues/2055).) |
| 6. Domain | Domain mode | **Pool** (per #169 the other modes are `byo-manual` and `byo-api`) |
| 6. Domain | Pool domain | `omani.works` |
| 6. Domain | Subdomain | `omantel` (validated via `POST /api/v1/subdomains/check` → PDM `/v1/reserve`) |
@ -127,7 +127,6 @@ The wizard's progress page connects to `GET /api/v1/deployments/{id}/logs` (Serv
| `flux` | Flux on new Sovereign (self) | <30s |
| `crossplane` | Flux on new Sovereign | 12 min |
| `sealed-secrets` | Flux on new Sovereign | ~30s |
| `spire` | Flux on new Sovereign | ~1 min |
| `jetstream` | Flux on new Sovereign | ~1 min |
| `openbao` | Flux on new Sovereign | 12 min |
| `keycloak` | Flux on new Sovereign | 23 min |

View File

@ -135,7 +135,7 @@ Expected: `{"status":"ok"}`. If unhealthy, the wizard's domain step (Step 6) can
### A.6 bp-* charts published at the current target version
The bootstrap-kit Kustomization references 12 charts. Today's target versions are:
The bootstrap-kit Kustomization references 11 charts (post founder PR [#665](https://github.com/openova-io/openova/pull/665), 2026-05-03 — slot 06 / `bp-spire` was removed; canonical workload identity is now Cilium WireGuard + K8s SA TokenReview; the `platform/spire/` chart is retained as opt-in only — re-introduction roadmap in TBD-V29 [#2055](https://github.com/openova-io/openova/issues/2055)). Today's target versions are:
| Chart | Target version |
|---|---|
@ -144,7 +144,6 @@ The bootstrap-kit Kustomization references 12 charts. Today's target versions ar
| `bp-flux` | `1.1.0` |
| `bp-crossplane` | `1.1.0` |
| `bp-sealed-secrets` | `1.1.0` |
| `bp-spire` | `1.1.0` |
| `bp-nats-jetstream` | `1.1.0` |
| `bp-openbao` | `1.1.0` |
| `bp-keycloak` | `1.1.0` |
@ -152,12 +151,12 @@ The bootstrap-kit Kustomization references 12 charts. Today's target versions ar
| `bp-powerdns` | `1.1.0` |
| `bp-catalyst-platform` | `1.1.0` |
When the observability-toggle agent lands, all 12 charts move to `1.1.1`. The bump is the operator's signal that observability defaults flip from on to off (per the IMPLEMENTATION-STATUS hardening) — the underlying charts are functionally compatible.
When the observability-toggle agent lands, all 11 charts move to `1.1.1`. The bump is the operator's signal that observability defaults flip from on to off (per the IMPLEMENTATION-STATUS hardening) — the underlying charts are functionally compatible.
**Before provisioning, confirm the 12 OCI artifacts exist** at the target version:
**Before provisioning, confirm the 11 OCI artifacts exist** at the target version:
```bash
for chart in cilium cert-manager flux crossplane sealed-secrets spire nats-jetstream openbao keycloak gitea powerdns catalyst-platform; do
for chart in cilium cert-manager flux crossplane sealed-secrets nats-jetstream openbao keycloak gitea powerdns catalyst-platform; do
printf '%-24s ' "bp-$chart"
curl -sS -H "Authorization: Bearer $(echo -n "$GHCR_PULL_TOKEN" | base64)" \
"https://ghcr.io/v2/openova-io/bp-$chart/tags/list" 2>/dev/null | \
@ -225,20 +224,22 @@ Cloud-init on the control-plane node, in this exact order:
5. `kubectl create secret generic ghcr-pull -n flux-system --from-literal=token="$CATALYST_GHCR_PULL_TOKEN"` — durable so private bp-* charts pull cleanly (commit `dddbab4b`, #9 below).
6. Apply the GitRepository pointing at `clusters/<sovereign-fqdn>/` in the public OpenOva monorepo.
7. Apply two Kustomizations split for CRD ordering (commit `34c8de84`, #8 below):
- `bootstrap-kit` — installs the 11 platform charts.
- `bootstrap-kit` — installs the 10 platform charts (down from 11 pre-PR-665 — bp-spire deferred; see note below).
- `infrastructure-config` — applies Crossplane Compositions and ProviderConfigs after Crossplane CRDs exist.
### B.4 Phase 1 — bootstrap-kit (1015 min)
Flux pulls 11 `bp-*` HelmReleases in dependency order via `dependsOn`:
Flux pulls 10 `bp-*` HelmReleases in dependency order via `dependsOn`:
```
cilium → cert-manager → flux → crossplane → sealed-secrets
spire → nats-jetstream → openbao → keycloak → gitea → powerdns
nats-jetstream → openbao → keycloak → gitea → powerdns
```
Then `bp-catalyst-platform` (umbrella) reconciles. The 11 + umbrella = 12 G2 wrapper charts (per [`SOVEREIGN-PROVISIONING.md`](SOVEREIGN-PROVISIONING.md) §3 and [`IMPLEMENTATION-STATUS.md`](IMPLEMENTATION-STATUS.md) §7).
(Pre-2026-05-03 the chain included `spire → nats-jetstream`; founder PR [#665](https://github.com/openova-io/openova/pull/665) removed bp-spire from the bootstrap-kit — chart retained as opt-in. Re-introduction roadmap: TBD-V29 [#2055](https://github.com/openova-io/openova/issues/2055).)
Then `bp-catalyst-platform` (umbrella) reconciles. The 10 + umbrella = 11 G2 wrapper charts post founder PR [#665](https://github.com/openova-io/openova/pull/665) (per [`SOVEREIGN-PROVISIONING.md`](SOVEREIGN-PROVISIONING.md) §3 and [`IMPLEMENTATION-STATUS.md`](IMPLEMENTATION-STATUS.md) §7).
### B.5 cert-manager + Cilium Gateway + console URL (12 min)

View File

@ -11,7 +11,7 @@ A new **Sovereign** — a self-sufficient deployed Catalyst — provisioned end-
- A k3s cluster running on Hetzner Cloud servers in your chosen region
- Cilium CNI + Gateway API as ingress, Flux as GitOps reconciler, Crossplane as day-2 IaC
- The 12-component bootstrap kit installed and reconciling cleanly: cilium → cert-manager → flux → crossplane → sealed-secrets → spire → nats-jetstream → openbao → keycloak → gitea → powerdns → bp-catalyst-platform
- The 11-component bootstrap kit installed and reconciling cleanly: cilium → cert-manager → flux → crossplane → sealed-secrets → nats-jetstream → openbao → keycloak → gitea → powerdns → bp-catalyst-platform (pre-2026-05-03 the chain included `spire` between sealed-secrets and nats-jetstream — founder PR [#665](https://github.com/openova-io/openova/pull/665) dropped bp-spire; canonical workload identity is now Cilium WireGuard + K8s SA TokenReview, and the `platform/spire/` chart is retained as opt-in. Re-introduction roadmap: TBD-V29 [#2055](https://github.com/openova-io/openova/issues/2055))
- Reachable URLs: `console.<your-fqdn>`, `gitea.<your-fqdn>`, `admin.<your-fqdn>` (TLS via cert-manager + Let's Encrypt)
- Initial sovereign-admin user in Keycloak's `catalyst-admin` realm
- The Sovereign is now self-sufficient — the catalyst-provisioner has zero ongoing connection to it (Phase 1 hand-off complete)
@ -123,7 +123,7 @@ The catalyst-api retains the OpenTofu state per-Sovereign in `/tmp/catalyst/tofu
| `bp-cert-manager` reconciles but cert issuance fails | Let's Encrypt rate-limit (50 certs / week / domain) or DNS records not propagated | Check `cert-manager` events: `kubectl -n cert-manager describe challenge`; for rate-limit, wait. For DNS, dig the records: `dig console.<your-fqdn> +short` should return the LB IP |
| `console.<sovereign-fqdn>` returns 404 / connection-refused | Per-Sovereign PowerDNS zone records not yet visible to public resolvers (parent-zone NS-delegation TTL ~15 min for pool, customer-registrar TTL for BYO byo-manual / byo-api) | `dig <sovereign-fqdn> NS` should return OpenOva NS; `dig console.<sovereign-fqdn>` should return the LB IP. Allow up to 30 min for DNS propagation |
| Keycloak reset-password email never arrives | SMTP not configured in Keycloak realm yet | Reset via the catalyst-admin realm-admin flow inside the cluster: `kubectl -n catalyst-system exec -it keycloak-0 -- /opt/keycloak/bin/kcadm.sh ...` (the catalyst-admin path is documented in `clusters/<sovereign-fqdn>/keycloak/README.md`) |
| Bootstrap-kit Kustomization stuck `Ready=False`; PVCs (bp-spire, bp-keycloak postgres, bp-openbao, bp-nats-jetstream, bp-gitea, bp-catalyst-platform postgres) all `Pending` indefinitely | StorageClass missing — k3s started without `local-path-provisioner` and the cluster has no default class for HelmReleases that don't pin `storageClassName` | See **StorageClass missing** below |
| Bootstrap-kit Kustomization stuck `Ready=False`; PVCs (bp-keycloak postgres, bp-openbao, bp-nats-jetstream, bp-gitea, bp-catalyst-platform postgres) all `Pending` indefinitely (and `bp-spire` server PVC too if you have opted bp-spire back in — bp-spire was removed from the canonical bootstrap-kit by founder PR [#665](https://github.com/openova-io/openova/pull/665); chart retained as opt-in. Re-introduction roadmap: TBD-V29 [#2055](https://github.com/openova-io/openova/issues/2055)) | StorageClass missing — k3s started without `local-path-provisioner` and the cluster has no default class for HelmReleases that don't pin `storageClassName` | See **StorageClass missing** below |
**Escalation:** if the runbook doesn't unblock you, file an issue against `github.com/openova-io/openova` with the `area/platform` and `kind/provisioning` labels, including: Sovereign FQDN, region, last 50 SSE events, last 100 lines of `kubectl -n flux-system get events`, and the OpenTofu workdir contents (excluding `tofu.auto.tfvars.json` which contains the Hetzner token).
@ -135,10 +135,12 @@ The catalyst-api retains the OpenTofu state per-Sovereign in `/tmp/catalyst/tofu
$ kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ...
keycloak data-keycloak-postgresql-0 Pending ...
spire-system spire-data-spire-server-0 Pending ...
openbao data-openbao-0 Pending ...
nats data-nats-jetstream-0 Pending ...
```
(The pre-2026-05-03 example output also showed a `spire-system/spire-data-spire-server-0` row. Founder PR [#665](https://github.com/openova-io/openova/pull/665) dropped bp-spire from the bootstrap-kit; the `spire-system` namespace is only present on Sovereigns that have opted bp-spire back in. Re-introduction roadmap: TBD-V29 [#2055](https://github.com/openova-io/openova/issues/2055).)
`kubectl describe pvc <name>` reports `no persistent volumes available for this claim and no storage class is set`. `kubectl get sc` returns `No resources found`.
**Root cause.** Pre-2026-04-29 the cloud-init template passed `--disable=local-storage` to the k3s installer, on the assumption that Crossplane would install hcloud-csi day-2 and register the StorageClass after `bp-crossplane` reconciled. That created a circular dependency: every PVC-using HelmRelease in the bootstrap-kit blocks waiting on a StorageClass that would only exist AFTER the bootstrap-kit had finished installing. Result: Sovereign deadlocks on first boot.

View File

@ -60,7 +60,7 @@ A handed-over Sovereign must own its own GitOps loop, its own DNS, its own cert
| 12 | `bp-powerdns` | Authoritative DNS + PDM + dnsdist | 🟢 chart-released | ❓ unknown — never observed serving a delegated subdomain on a Sovereign |
| 13 | `bp-gitea` | Sovereign-owned Git server | 🟢 chart-verified — `bp-gitea:1.1.2` smoke OK ([#376](https://github.com/openova-io/openova/issues/376)) | ❓ unknown |
| 14 | `bp-keycloak` | OIDC IDP — per-Sovereign realm | 🟢 chart-verified — admin login OK + #326 kubectl OIDC client ([#377](https://github.com/openova-io/openova/issues/377), [#326](https://github.com/openova-io/openova/issues/326)) | ❓ unknown — kubectl OIDC flow never exercised live |
| 15 | `bp-spire` | Workload identity — service-to-service mTLS | 🟢 chart-verified — k8s_psat attestation OK ([#382](https://github.com/openova-io/openova/issues/382)) | ❓ unknown |
| 15 | `bp-spire` | Workload identity — service-to-service mTLS | **deferred** — chart was verified ([#382](https://github.com/openova-io/openova/issues/382)) but slot 06 removed from `clusters/_template/bootstrap-kit/` by founder PR [#665](https://github.com/openova-io/openova/pull/665) (2026-05-03, "drop bp-spire — Cilium WireGuard is canonical east-west mesh"). The `platform/spire/` chart is retained as opt-in for future re-introduction; re-enable triggers + roadmap in TBD-V29 ([#2055](https://github.com/openova-io/openova/issues/2055)). Today's canonical: Cilium WireGuard for east-west transport encryption + K8s ServiceAccount TokenReview for workload-to-workload auth | n/a (deferred) |
| 16 | `bp-crossplane` | Day-2 cloud-resource provisioning | 🟢 chart-verified ([#378](https://github.com/openova-io/openova/issues/378)) | ❓ unknown — `provider-hcloud` Healthy=True never observed on a real Sovereign |
| 17 | `bp-crossplane-claims` | XRDs + Compositions | 🟢 chart-released — event-driven HR fix ([#327](https://github.com/openova-io/openova/issues/327)) + UserAccess XRD ([#322](https://github.com/openova-io/openova/issues/322)) | ❓ unknown |
| 18 | `bp-harbor` | Container registry — avoids Docker Hub rate limits | 🟢 chart-released — vendor-agnostic Object Storage ([#383](https://github.com/openova-io/openova/issues/383)) | ❓ unknown — Hetzner-S3 backend signin never exercised live |
@ -455,7 +455,7 @@ flowchart TB
| [#379](https://github.com/openova-io/openova/issues/379) | bp-kyverno install | #338 |
| [#380](https://github.com/openova-io/openova/issues/380) | bp-trivy install | #338 |
| [#381](https://github.com/openova-io/openova/issues/381) | bp-grafana stack install | #338 |
| [#382](https://github.com/openova-io/openova/issues/382) | bp-spire install | #338, bp-cert-manager |
| [#382](https://github.com/openova-io/openova/issues/382) | bp-spire install (⏸ **deferred** post PR [#665](https://github.com/openova-io/openova/pull/665) — chart retained as opt-in; canonical workload identity now Cilium WireGuard + K8s SA TokenReview; re-introduction roadmap in TBD-V29 [#2055](https://github.com/openova-io/openova/issues/2055)) | #338, bp-cert-manager |
### Phase 6 — Catalyst control plane (depends on Phases 2 + 4 + 5)