openova/clusters/_template/bootstrap-kit
e3mrah 0a0b912e0d
fix(wizard): KServe was wrongly under Always Included on every Sovereign (#1068)
* fix(hetzner-purge): close volumes/primary_ips/floating_ips gap — wipe was leaving Crossplane orphans

Founder caught the gap on omantel.biz post-decommission: Hetzner
console showed 0 servers/LBs/IPs but 1 Volume + 2 Networks + 1
Firewall lingering. Networks/Firewall were the existing async-detach
window (handled by name-prefix fallback in the next provision); the
**Volume** was a hard miss — Purge() never called /v1/volumes.

Root cause: post-handover, the Hetzner Cloud Volume CSI driver
allocates Hetzner Volumes for every CNPG/Harbor/Loki/Mimir
StatefulSet PVC. tofu state never tracks them. When the operator
decommissions, `tofu destroy` is a no-op for the Volume and the
existing label-sweep didn't list /v1/volumes either. Result: orphan
volumes accrue cloud cost across re-provision cycles.

Same architectural gap for primary_ips (CCM-allocated for LoadBalancer
services since Hetzner's 2023 IP-decoupling) and floating_ips
(rare in Catalyst stack but listed for completeness).

Fix: extend Purge() + purgeByNamePrefix() to walk three additional
endpoints in dependency order:

  servers → load_balancers → firewalls → networks → ssh_keys
  → volumes (after servers detach)
  → primary_ips (after LBs free their IPs)
  → floating_ips

Both label-pass AND name-prefix-pass cover all 8 kinds. PurgeReport
extended with Volumes/PrimaryIPs/FloatingIPs slices; Total() updated.

CSI-named volumes (`pvc-<uid>` form) won't match either pass — those
need the canonical `catalyst.openova.io/sovereign=<fqdn>` label which
the Crossplane composition for VolumeClaim must apply. That's a
separate composition-layer fix tracked separately; this PR closes
the wipe gap for everything labelled OR name-prefixed.

Bump chart 1.4.80 → 1.4.81.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(wizard): KServe was wrongly under Always Included on every Sovereign

Founder caught on console.openova.io/sovereign/wizard step 4: KServe
appeared in the "Always Included" section as if every Sovereign had
to install it. False positive — KServe is conditionally mandatory
ONLY when the operator opts into the CORTEX (AI/ML) product family.

Two coupled bugs:

(1) Data model: kserve was tagged tier:'mandatory' inside the CORTEX
    product family, but tier:'mandatory' is consumed everywhere in
    the wizard as "always-on regardless of family selection":
      - componentGroups.ts:543 — seedIds.add(c.id) → auto-selected at
        wizard init for every Sovereign
      - applicationCatalog.ts:97 — seeded into the apps grid
      - store.ts:642 — special-cased as undeselectable
      - StepComponents.tsx — surfaced under "Always Included" tab
    Demote to tier:'recommended'. CORTEX has
    cascadeOnMemberSelection:true so picking any CORTEX member (vLLM,
    Specter, BGE, Milvus, …) still auto-pulls KServe via the cascade
    — that's the right semantics. KServe stays visible under CORTEX
    in Tab 1 ("Choose Your Stack") and locks-in once CORTEX is
    selected.

(2) UI filter: AlwaysIncludedTab was iterating every PRODUCTS entry
    regardless of product.tier and listing every member with
    component.tier === 'mandatory'. That mixes the platform-mandatory
    layer (PILOT/SPINE/SURGE/SILO/GUARDIAN tier:'mandatory' families)
    with conditional-mandatory members of opt-in families
    (CORTEX/RELAY tier:'optional', INSIGHTS/FABRIC tier:'recommended').
    Filter by product.tier === 'mandatory' so only the always-on
    families' mandatory members appear. Defence-in-depth — even if a
    new opt-in family ships with internal-mandatory members, they
    won't leak into "Always Included".

Audit confirmed kserve was the only offender across all 9 product
families today. PILOT/SPINE/SURGE/SILO/GUARDIAN remain unchanged
(their members rightfully tier:'mandatory'); CORTEX kserve fixed;
others have no internal mandatories.

Bump chart 1.4.81 → 1.4.82.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:33:19 +04:00
..
01-cilium.yaml fix(bp-cilium): upgrade upstream cilium 1.16.5 → 1.19.3 (1.2.0) (#684) 2026-05-03 17:20:54 +04:00
01a-gateway-api.yaml fix(cloud-init): install Gateway API v1.1.0 CRDs before cilium so operator registers gateway controller (#581) 2026-05-02 13:23:32 +04:00
02-cert-manager.yaml fix(provisioner): cloud-init bootstrap-kit path matches per-FQDN cluster dir (resolves #218) (#256) 2026-04-30 17:11:44 +04:00
03-flux.yaml fix(bp-flux): catalyst-cluster-reconciler ClusterRoleBinding overlay (closes #338) (#393) 2026-05-01 15:56:45 +04:00
04-crossplane.yaml fix(provisioner): cloud-init bootstrap-kit path matches per-FQDN cluster dir (resolves #218) (#256) 2026-04-30 17:11:44 +04:00
05-sealed-secrets.yaml fix(bp-*): event-driven HR install -- drop blanket timeout, use disableWait (#250) 2026-04-30 16:55:19 +04:00
05a-reflector.yaml fix: bp-reflector + rename ghcr-pull-secret->ghcr-pull (Closes #543) (#554) 2026-05-02 12:17:51 +04:00
06a-bp-self-sovereign-cutover.yaml fix(cutover step-01): clone+push (regular repo) instead of pull-mirror (#1033) 2026-05-06 03:19:05 +04:00
07-nats-jetstream.yaml fix(bootstrap-kit): remove empty dependsOn block in nats-jetstream HR (#667) 2026-05-03 14:08:32 +04:00
08-openbao.yaml fix(bp-openbao): add BAO_TOKEN+NAMESPACE env to auth-bootstrap (chart 1.2.14) (#666) 2026-05-03 14:02:34 +04:00
09-keycloak.yaml fix(bootstrap-kit): bump bp-keycloak to 1.4.0 for tenant-mode realm (#915) (#938) 2026-05-05 14:44:37 +04:00
10-gitea.yaml fix(bp-gitea): mirror gitea-admin-secret to catalyst ns via reflector annotations (#844) 2026-05-05 00:37:04 +04:00
11-powerdns.yaml fix(bp-powerdns): zone-bootstrap Job needs /tmp emptyDir (curl -o + readOnlyRootFS) (#843) 2026-05-05 00:28:44 +04:00
12-external-dns.yaml fix(bp-external-dns): apiserver Endpoints sync timeout — Cilium kube-apiserver entity required (closes #770) (#771) 2026-05-04 19:27:17 +04:00
13-bp-catalyst-platform.yaml fix(wizard): KServe was wrongly under Always Included on every Sovereign (#1068) 2026-05-07 00:33:19 +04:00
14-crossplane-claims.yaml fix(bp-crossplane-claims): event-driven HR install — disableWait, drop 15m timeout (#327) 2026-05-01 17:21:03 +04:00
15-external-secrets.yaml fix(bp-external-secrets-stores): split ClusterSecretStore into separate chart per #247 pattern (closes #331) (#426) 2026-05-01 17:33:47 +04:00
15a-external-secrets-stores.yaml fix(bp-external-secrets-stores): split ClusterSecretStore into separate chart per #247 pattern (closes #331) (#426) 2026-05-01 17:33:47 +04:00
16-cnpg.yaml feat(bootstrap-kit): storage+DB foundation batch — slots 15-19 (W2.K1; resolves #254) (#262) 2026-04-30 17:18:12 +04:00
17-valkey.yaml feat(bootstrap-kit): storage+DB foundation batch — slots 15-19 (W2.K1; resolves #254) (#262) 2026-04-30 17:18:12 +04:00
18-seaweedfs.yaml fix(bp-seaweedfs): remove trailing slash in registry — fixes double-slash image ref (Closes #568) (#576) 2026-05-02 13:02:48 +04:00
19-harbor.yaml fix(bp-harbor): inline labels on admin Secret to drop duplicate keys (#949) (#950) 2026-05-05 15:19:17 +04:00
20-opentelemetry.yaml feat(bootstrap-kit): observability batch — slots 20-26 (W2.K2) (#277) 2026-04-30 17:21:26 +04:00
21-alloy.yaml fix(bp-trivy): node-collector tolerates control-plane taint (closes #769) (#772) 2026-05-04 17:38:29 +02:00
22-loki.yaml feat(bootstrap-kit): observability batch — slots 20-26 (W2.K2) (#277) 2026-04-30 17:21:26 +04:00
23-mimir.yaml fix: drop bp-langfuse from minimal + bp-mimir 1.0.2 push_grpc fix (#664) 2026-05-03 13:50:38 +04:00
24-tempo.yaml feat(bootstrap-kit): observability batch — slots 20-26 (W2.K2) (#277) 2026-04-30 17:21:26 +04:00
25-grafana.yaml fix(bootstrap-kit): install Gateway API CRDs ahead of HTTPRoute charts (#503) (#505) 2026-05-02 01:30:50 +04:00
27-kyverno.yaml feat(bootstrap-kit): security+policy batch — slots 27-34 (W2.K3) (#276) 2026-04-30 17:22:34 +04:00
28-reloader.yaml feat(bootstrap-kit): security+policy batch — slots 27-34 (W2.K3) (#276) 2026-04-30 17:22:34 +04:00
29-vpa.yaml fix(bp-vpa): drop registry.k8s.io/ prefix in repository (upstream prepends it) (#641) 2026-05-02 23:32:35 +04:00
30-trivy.yaml fix(bp-trivy): node-collector tolerates control-plane taint (closes #769) (#772) 2026-05-04 17:38:29 +02:00
31-falco.yaml fix(bp-falco): rename rules_file → rules_files (Falco 0.36+ canonical key, Closes #570) (#574) 2026-05-02 12:59:29 +04:00
32-sigstore.yaml feat(bootstrap-kit): security+policy batch — slots 27-34 (W2.K3) (#276) 2026-04-30 17:22:34 +04:00
33-syft-grype.yaml feat(bootstrap-kit): security+policy batch — slots 27-34 (W2.K3) (#276) 2026-04-30 17:22:34 +04:00
34-velero.yaml wip(#425): vendor-agnostic OS rename — partial (rate-limited mid-run) (#435) 2026-05-01 18:05:19 +04:00
35-coraza.yaml feat(bootstrap-kit): edge + apps + AI batch — slot 35 (W2.K4) (#261) 2026-04-30 17:23:59 +04:00
49-bp-cert-manager-powerdns-webhook.yaml fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681) 2026-05-03 17:12:48 +04:00
50-cluster-autoscaler.yaml fix(autoscaler+wizard): wire HCLOUD_CLOUD_INIT, validate SKU/region in catalyst-api (#965) 2026-05-05 16:21:59 +04:00
80-newapi.yaml fix(bp-newapi+services-build): imagePullSecrets on Pod, sed bumps values.yaml smeTag (#955) 2026-05-05 15:47:37 +04:00
kustomization.yaml chore(bootstrap-kit): remove slot 95 bp-stalwart-sovereign (Phase-2 deferred) (#958) 2026-05-05 15:55:30 +04:00