* feat(bp-newapi): chart maturation — ExternalSecret + first-otech vLLM channel + skip-render gates (#799) Maturation work for the SME-3 turnkey-experience epic (#795). Aligns the bp-newapi scratch chart with ADR-0003 (RBAC ↔ NewAPI user-create hook contract) and gets it past the blueprint-release CI smoke render that has blocked publication since PR #396 (run 25213444992 failed at default-values render of v1.0.0). Changes ------- - templates/external-secret.yaml (NEW). Renders the `catalyst-newapi-admin-token` ExternalSecret consumed by unified-rbac (ADR-0003 §3.2 + §6) for issuing per-user keys against `http://newapi.newapi.svc/api/v1/admin/users`. Sourced from OpenBao via the `vault-region1` ClusterSecretStore (canonical default shipped by bp-external-secrets-stores). Capabilities-gated on `external-secrets.io/v1beta1` so cold installs without ESO don't fail-render. Operator supplies the per-Sovereign OpenBao path via `catalystIntegration.externalSecret.remoteRef.key`; canonical convention is `sovereign/<sovereign-fqdn>/newapi/admin-token` with property `ADMIN_API_TOKEN`. Per Inviolable Principle #4 every knob is operator-overridable in the cluster overlay. - values.yaml. Adds `catalystIntegration.externalSecret.{enabled, refreshInterval, secretStoreRef.{kind,name}, remoteRef.{key,property}}` block (default enabled=true, key="" so a misconfigured overlay fails loudly at render rather than silently skipping). Adds `defaultChannels.vllm` block — first-otech shorthand that composes a vLLM-typed channel into the rendered channels list when enabled. Default endpoint is empty per Inviolable Principle #4; the `clusters/<sovereign>/bootstrap-kit/80-newapi.yaml` overlay supplies the per-Sovereign URL (canonical first-otech reference = `https://llm-api.omtd.bankdhofar.com` model `qwen3-coder`, the same upstream Axon uses on the OpenOva marketing deployment). - templates/_helpers.tpl. New `bp-newapi.effectiveChannels` helper composes `.Values.channels` with `defaultChannels.vllm` (when enabled). The `assertChannelAttestation` helper now operates on the effective list so attestation gates apply to defaultChannels composition too. `defaultChannels.vllm.enabled=true` with empty endpoint fails-fast at render with a guided error message. - templates/configmap.yaml. Channels rendering switches to the effectiveChannels helper. OIDC block now skip-renders gracefully when `auth.adminUI.keycloak.issuer` is unset (smoke-render path) instead of `required`-failing; the per-Sovereign overlay sets the issuer. - templates/deployment.yaml. Skip-render gate on Deployment when `database.existingSecret`, `credentials.existingSecret`, or (when Keycloak mode is selected) the OIDC client secret is missing. Removes the four `required` calls that were failing CI smoke render. Service, ServiceAccount, ConfigMap, NetworkPolicy still render so the smoke test gets a non-empty output proving structural soundness; the actual Deployment defers until the per-Sovereign overlay wires the secrets. - templates/ingress.yaml. Same skip-render pattern: when either `ingress.host` or `ingress.adminHost` is empty, the entire ingress block is silently skipped. Matches the bp-keycloak / bp-openbao / bp-external-dns HTTPRoute templates. - Chart.yaml. version 1.0.0 → 1.1.0 (minor bump — additive features; no breaking changes to existing operator overrides). Verification ------------ `helm template` smoke render on default values now succeeds with 4 resources (NetworkPolicy / ServiceAccount / ConfigMap / Service); 168 lines, well above the CI 5-line minimum. With a full per-Sovereign overlay (hosts + secrets + Keycloak issuer + ESO Capabilities + Traefik Capabilities + defaultChannels.vllm.endpoint), 8 resources render including Deployment, both Ingresses, the Traefik allowlist Middleware, and the ExternalSecret. The composed qwen channel writes through to `channels.yaml` with the expected endpoint + models + attestation. Refs ---- ADR-0003 §3.2 + §6 — admin-token contract Issue #795 (epic) — locked decisions Issue #796 — hook contract spec (sequential blocker, merged) Inviolable Principles #1, #3, #4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(bootstrap-kit): slot 80 — bp-newapi default install (#799) Adds the canonical install slot for bp-newapi to every fresh Sovereign's bootstrap-kit. Sequenced after the W2.K1 dependency wave so NewAPI's ExternalSecret + Postgres DSN dependencies resolve on first reconcile. The HelmRelease declares `dependsOn: [bp-openbao, bp-keycloak, bp-cnpg]`: - bp-openbao(08): admin-token ExternalSecret backend - bp-keycloak(09): OIDC issuer for ops-staff admin UI at admin.<fqdn> - bp-cnpg(16): Postgres backing for users/credits/channels/audit Per-Sovereign overlays inherit the slot's defaults and override: - ingress.host api.${SOVEREIGN_FQDN} - ingress.adminHost admin.${SOVEREIGN_FQDN} - auth.adminUI.keycloak.issuer - database.existingSecret (Crossplane-claimed) - credentials.existingSecret - catalystIntegration.externalSecret.remoteRef.key sovereign/${FQDN}/newapi/admin-token - defaultChannels.vllm.enabled true (first-otech) - defaultChannels.vllm.endpoint (operator-supplied) The `_template/` slot keeps `defaultChannels.vllm.enabled: false` so a fresh Sovereign does not silently wire customers to a third-party endpoint; the canonical first-otech reference (Qwen3 Coder via `https://llm-api.omtd.bankdhofar.com`, same relay Axon uses on the OpenOva marketing deployment) is documented in-line for operators adopting the same upstream. Refs: #795 (epic), ADR-0003 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bootstrap-deps): register bp-newapi slot 80 in expected DAG (#799) Fixes the dependency-graph-audit drift detection caught at PR #812 CI: the audit script enumerates HelmReleases in clusters/_template/bootstrap-kit/ and compares to scripts/expected-bootstrap-deps.yaml; an HR present on disk but absent from the expected DAG is treated as drift. Adds the canonical entry for bp-newapi at slot 80 with the same depends_on set declared on the HelmRelease itself ([bp-openbao, bp-keycloak, bp-cnpg]). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bp-newapi): align blueprint.yaml spec.version with Chart.yaml (#799) The TestBootstrapKit_BlueprintCardsHaveRequiredFields static-validation gate asserts Chart.yaml version == blueprint.yaml spec.version. The chart was bumped to 1.1.0 in c63ecd8c; bumping the blueprint metadata to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
337 lines
12 KiB
YAML
337 lines
12 KiB
YAML
# Expected dependency DAG for clusters/_template/bootstrap-kit/*.yaml
|
||
#
|
||
# Authoritative spec: docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.
|
||
# Consumed by: scripts/check-bootstrap-deps.sh
|
||
# Updated by: W2.K0 (slots 01-14 baseline + slots 15-48 forward declarations)
|
||
# W2.K1, K2, K3, K4 PRs add the corresponding HR files; this
|
||
# file already declares the expected deps for those slots so
|
||
# each W2 PR can be mechanically verified at merge time.
|
||
#
|
||
# Schema:
|
||
# slots:
|
||
# - slot: <int> # numeric prefix on the HR file (01..48)
|
||
# name: <string> # value of metadata.name on the HelmRelease
|
||
# depends_on: [<string>] # ordered or unordered; comparison is set-based
|
||
# wave: <"present"|"W2.K1"|"W2.K2"|"W2.K3"|"W2.K4">
|
||
#
|
||
# Comparison semantics enforced by check-bootstrap-deps.sh:
|
||
# - Each HR file present on disk MUST declare exactly the depends_on set listed
|
||
# here (missing edges -> error, extra edges -> error).
|
||
# - HRs declared here but not yet present on disk are reported as "deferred"
|
||
# (info, not an error) so that this file can be the static authoritative list
|
||
# while W2.K1..K4 land their HR files in series.
|
||
# - The graph is checked for cycles after merging declared+actual edges.
|
||
#
|
||
# The slot-numbering convention is documented in BOOTSTRAP-KIT-EXPANSION-PLAN.md §3.
|
||
|
||
slots:
|
||
# ---- Tier 0-4: present today (post-PR-247 baseline) -----------------------
|
||
- slot: 1
|
||
name: bp-cilium
|
||
depends_on: []
|
||
wave: present
|
||
- slot: 1a
|
||
name: bp-gateway-api
|
||
# Upstream Kubernetes Gateway API CRDs (Standard channel — issue #503).
|
||
# Cilium 1.16's `gatewayAPI.enabled=true` enables the controller but does
|
||
# NOT install the gateway.networking.k8s.io CRDs themselves; without them
|
||
# every chart that ships HTTPRoute templates (bp-keycloak / bp-gitea /
|
||
# bp-powerdns / bp-openbao / bp-harbor / bp-grafana / bp-catalyst-platform)
|
||
# fails install with `no matches for kind HTTPRoute`. Same split-CRD
|
||
# pattern as bp-crossplane-claims and bp-external-secrets-stores.
|
||
depends_on: [bp-cilium]
|
||
wave: present
|
||
- slot: 2
|
||
name: bp-cert-manager
|
||
depends_on: [bp-cilium]
|
||
wave: present
|
||
- slot: 3
|
||
name: bp-flux
|
||
depends_on: [bp-cert-manager]
|
||
wave: present
|
||
- slot: 4
|
||
name: bp-crossplane
|
||
depends_on: [bp-flux]
|
||
wave: present
|
||
- slot: 5
|
||
name: bp-sealed-secrets
|
||
depends_on: [bp-cert-manager]
|
||
wave: present
|
||
- slot: "5a"
|
||
name: bp-reflector
|
||
# emberstack/reflector — secret/configmap mirror controller (issue #543).
|
||
# Propagates ghcr-pull secret to every namespace so cross-namespace
|
||
# ImagePullBackOff gaps are eliminated. Slot 5a: after sealed-secrets,
|
||
# before spire. dependsOn bp-cert-manager (CRDs must exist).
|
||
# Used by bp-gitea + bp-harbor to propagate CNPG-generated pg-app Secrets.
|
||
depends_on: [bp-cert-manager]
|
||
wave: present
|
||
- slot: 7
|
||
name: bp-nats-jetstream
|
||
depends_on: []
|
||
wave: present
|
||
- slot: 8
|
||
name: bp-openbao
|
||
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template;
|
||
# gateway.networking.k8s.io/v1 CRDs must be registered before install.
|
||
# bp-cnpg dep (issue #512): post-install init hook (`bao operator init`)
|
||
# races cnpg readiness on a fresh Sovereign, hitting the 15m install
|
||
# timeout. Explicit dep makes Flux wait for cnpg Ready=True first.
|
||
depends_on: [bp-gateway-api, bp-cnpg]
|
||
wave: present
|
||
- slot: 9
|
||
name: bp-keycloak
|
||
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
|
||
depends_on: [bp-cert-manager, bp-gateway-api]
|
||
wave: present
|
||
- slot: 10
|
||
name: bp-gitea
|
||
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
|
||
# bp-cnpg dep (issue #584): chart ships a CNPG Cluster CR; postgresql.cnpg.io/v1
|
||
# CRD must be registered before bp-gitea applies so Capabilities gate fires.
|
||
depends_on: [bp-keycloak, bp-gateway-api, bp-cnpg]
|
||
wave: present
|
||
- slot: 11
|
||
name: bp-powerdns
|
||
# bp-gateway-api dep (issue #503): chart ships an api-httproute.yaml template.
|
||
# bp-cnpg dep: chart's templates/cnpg-cluster.yaml renders a
|
||
# postgresql.cnpg.io/v1.Cluster gated on Capabilities.APIVersions.
|
||
# Without this dep Helm renders before the CRD is registered, the
|
||
# gate evaluates false, the Cluster CR is silently skipped, CNPG
|
||
# never creates pdns-pg-app, and powerdns Pods fail at boot with
|
||
# "secret pdns-pg-app not found" (caught live during otech28).
|
||
depends_on: [bp-cert-manager, bp-gateway-api, bp-cnpg]
|
||
wave: present
|
||
- slot: 12
|
||
name: bp-external-dns
|
||
# bp-reflector dep (issue #543): external-dns HTTPRoute uses reflector-mirrored
|
||
# ghcr-pull secret; reflector must be Ready before external-dns deploys.
|
||
depends_on: [bp-cert-manager, bp-powerdns, bp-reflector]
|
||
wave: present
|
||
- slot: 13
|
||
name: bp-catalyst-platform
|
||
# bp-gateway-api dep (issue #503): umbrella chart ships catalyst-ui +
|
||
# catalyst-api HTTPRoute templates.
|
||
# bp-keycloak + bp-cnpg deps (issue #512): umbrella post-install Jobs
|
||
# bootstrap OIDC clients + seed PG schemas; both deps take 5+ min to
|
||
# reach Ready on a fresh Sovereign, racing the 15m install timeout.
|
||
# Explicit deps make Flux wait for both Ready=True before umbrella starts.
|
||
depends_on: [bp-gitea, bp-gateway-api, bp-keycloak, bp-cnpg]
|
||
wave: present
|
||
- slot: 14
|
||
name: bp-crossplane-claims
|
||
depends_on: [bp-crossplane]
|
||
wave: present
|
||
|
||
# ---- Tier 5: storage + DB (W2.K1, slots 15-19) ----------------------------
|
||
- slot: 15
|
||
name: bp-external-secrets
|
||
depends_on: [bp-openbao, bp-cert-manager]
|
||
wave: W2.K1
|
||
- slot: 15a
|
||
name: bp-external-secrets-stores
|
||
# Default ClusterSecretStore CR(s). Split from bp-external-secrets@1.0.0
|
||
# at PR #334 (issue #331) to resolve CRD-ordering deadlock —
|
||
# ClusterSecretStore CR cannot live in the same Helm release as the ESO
|
||
# subchart that registers its CRD. Mirrors bp-crossplane ↔
|
||
# bp-crossplane-claims pattern.
|
||
depends_on: [bp-external-secrets, bp-openbao]
|
||
wave: W2.K1
|
||
- slot: 16
|
||
name: bp-cnpg
|
||
depends_on: [bp-flux]
|
||
wave: W2.K1
|
||
- slot: 17
|
||
name: bp-valkey
|
||
depends_on: [bp-flux]
|
||
wave: W2.K1
|
||
- slot: 18
|
||
name: bp-seaweedfs
|
||
depends_on: [bp-flux, bp-cert-manager]
|
||
wave: W2.K1
|
||
- slot: 19
|
||
name: bp-harbor
|
||
# bp-seaweedfs dependency REMOVED per ADR-0001 §13 (cloud-direct).
|
||
# Harbor on Sovereigns writes blobs directly to cloud Object Storage
|
||
# (Hetzner / R2 / S3 / Azure / GCS), not via SeaweedFS. See
|
||
# clusters/_template/bootstrap-kit/19-harbor.yaml lines 35-37.
|
||
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template;
|
||
# gateway.networking.k8s.io/v1 CRDs must be registered first.
|
||
depends_on: [bp-cnpg, bp-cert-manager, bp-gateway-api]
|
||
wave: W2.K1
|
||
- slot: 6a
|
||
name: bp-self-sovereign-cutover
|
||
# Post-handover self-sovereignty cutover (issue #791). Filename
|
||
# carries the 06a- prefix to colocate cohorts visually but the slot
|
||
# depends on bp-gitea + bp-harbor and therefore actually installs
|
||
# AFTER both. Chart ships dormant — catalyst-api stamps Jobs from
|
||
# the chart's PodSpec ConfigMaps only on operator-driven trigger.
|
||
depends_on: [bp-gitea, bp-harbor]
|
||
wave: W2.K1
|
||
|
||
# ---- Tier 6: observability (W2.K2, slots 20-26) ---------------------------
|
||
- slot: 20
|
||
name: bp-opentelemetry
|
||
depends_on: [bp-cert-manager]
|
||
wave: W2.K2
|
||
- slot: 21
|
||
name: bp-alloy
|
||
depends_on: [bp-opentelemetry]
|
||
wave: W2.K2
|
||
- slot: 22
|
||
name: bp-loki
|
||
depends_on: [bp-seaweedfs]
|
||
wave: W2.K2
|
||
- slot: 23
|
||
name: bp-mimir
|
||
depends_on: [bp-seaweedfs]
|
||
wave: W2.K2
|
||
- slot: 24
|
||
name: bp-tempo
|
||
depends_on: [bp-seaweedfs]
|
||
wave: W2.K2
|
||
- slot: 25
|
||
name: bp-grafana
|
||
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
|
||
depends_on: [bp-cnpg, bp-loki, bp-mimir, bp-tempo, bp-keycloak, bp-gateway-api]
|
||
wave: W2.K2
|
||
# ---- Tier 7: security + policy (W2.K3, slots 27-34) -----------------------
|
||
- slot: 27
|
||
name: bp-kyverno
|
||
depends_on: [bp-cilium]
|
||
wave: W2.K3
|
||
- slot: 28
|
||
name: bp-reloader
|
||
depends_on: []
|
||
wave: W2.K3
|
||
- slot: 29
|
||
name: bp-vpa
|
||
depends_on: []
|
||
wave: W2.K3
|
||
- slot: 30
|
||
name: bp-trivy
|
||
depends_on: [bp-cert-manager]
|
||
wave: W2.K3
|
||
- slot: 31
|
||
name: bp-falco
|
||
depends_on: [bp-cilium]
|
||
wave: W2.K3
|
||
- slot: 32
|
||
name: bp-sigstore
|
||
depends_on: [bp-cert-manager]
|
||
wave: W2.K3
|
||
- slot: 33
|
||
name: bp-syft-grype
|
||
depends_on: [bp-cert-manager]
|
||
wave: W2.K3
|
||
- slot: 34
|
||
name: bp-velero
|
||
# No dependsOn — Velero on Hetzner Sovereigns writes DIRECTLY to
|
||
# Hetzner Object Storage per ADR-0001 §13 + WBS §3 (S3-aware app
|
||
# rule). The previous SeaweedFS dependency was retired in #384;
|
||
# Velero's BackupStorageLocation now consumes flux-system/hetzner-
|
||
# object-storage Secret (issue #371) via Flux valuesFrom, populated
|
||
# at HelmRelease apply time — no in-cluster prerequisite Blueprint.
|
||
depends_on: []
|
||
wave: W2.K3
|
||
|
||
# ---- Tier 8 + 9: edge + apps + AI runtime (W2.K4, slots 35-48) ------------
|
||
- slot: 35
|
||
name: bp-coraza
|
||
depends_on: [bp-cilium, bp-cert-manager]
|
||
wave: W2.K4
|
||
- slot: 36
|
||
name: bp-stunner
|
||
depends_on: [bp-cilium, bp-cert-manager]
|
||
wave: W2.K4
|
||
- slot: 37
|
||
name: bp-knative
|
||
depends_on: [bp-cert-manager]
|
||
wave: W2.K4
|
||
- slot: 38
|
||
name: bp-kserve
|
||
depends_on: [bp-knative]
|
||
wave: W2.K4
|
||
- slot: 39
|
||
name: bp-vllm
|
||
depends_on: [bp-kserve]
|
||
wave: W2.K4
|
||
- slot: 40
|
||
name: bp-llm-gateway
|
||
depends_on: [bp-cnpg, bp-keycloak]
|
||
wave: W2.K4
|
||
- slot: 41
|
||
name: bp-anthropic-adapter
|
||
depends_on: [bp-llm-gateway]
|
||
wave: W2.K4
|
||
- slot: 42
|
||
name: bp-bge
|
||
depends_on: [bp-cnpg]
|
||
wave: W2.K4
|
||
- slot: 43
|
||
name: bp-nemo-guardrails
|
||
depends_on: [bp-llm-gateway, bp-bge, bp-cnpg]
|
||
wave: W2.K4
|
||
- slot: 44
|
||
name: bp-temporal
|
||
depends_on: [bp-cnpg, bp-cert-manager]
|
||
wave: W2.K4
|
||
- slot: 45
|
||
name: bp-openmeter
|
||
depends_on: [bp-cnpg, bp-nats-jetstream]
|
||
wave: W2.K4
|
||
- slot: 46
|
||
name: bp-livekit
|
||
depends_on: [bp-stunner, bp-cert-manager]
|
||
wave: W2.K4
|
||
- slot: 47
|
||
name: bp-matrix
|
||
depends_on: [bp-cnpg, bp-keycloak, bp-cert-manager]
|
||
wave: W2.K4
|
||
- slot: 48
|
||
name: bp-librechat
|
||
depends_on: [bp-llm-gateway, bp-vllm, bp-bge, bp-keycloak]
|
||
wave: W2.K4
|
||
|
||
# ---- Slot 49 — DNS-01 wildcard TLS solver against contabo's central PowerDNS
|
||
# Authored under #373; lands at slot 49 because slots 36-48 were already
|
||
# forward-declared by the W2.K4 batch. Re-targeted from per-Sovereign
|
||
# PowerDNS to contabo central PowerDNS (https://pdns.openova.io) because
|
||
# omani.works is delegated from Dynadot to ns1/2/3.openova.io which run
|
||
# on contabo PowerDNS — the Sovereign's own PowerDNS is not on the
|
||
# public DNS chain until pool-domain-manager seals the per-Sovereign
|
||
# NS delegation. Caught live on otech43–46. Slot 49b
|
||
# (bp-cert-manager-dynadot-webhook) was dropped in the same PR
|
||
# (Dynadot is NOT the API-level authority for omani.works subdomains).
|
||
- slot: 49
|
||
name: bp-cert-manager-powerdns-webhook
|
||
depends_on: [bp-cert-manager]
|
||
wave: present
|
||
|
||
# ---- Slot 50 — Cluster Autoscaler (Hetzner). Issue #767.
|
||
# Adds/removes Hetzner workers in response to FailedScheduling events,
|
||
# bounded by per-Sovereign min/max node-group config the operator picks
|
||
# at launch. Hetzner token wired from flux-system/cloud-credentials —
|
||
# the same Secret Crossplane provider-hcloud reads, so no sibling-
|
||
# blueprint dep at install time. Lands AFTER slot 49 (the existing
|
||
# forward-declared cohort fills slots 36-49) to avoid colliding with
|
||
# the W2.K4 numbering plan.
|
||
- slot: 50
|
||
name: bp-cluster-autoscaler-hcloud
|
||
depends_on: []
|
||
wave: present
|
||
|
||
# ---- Slot 80 — bp-newapi multi-tenant LLM marketplace gateway. Issue #799.
|
||
# Sequenced past the W2.K4 numbering plan (slots 36-48) so it never
|
||
# collides with the AI-runtime / observability / livekit cohort. The
|
||
# HelmRelease's dependsOn pins install order to AFTER bp-openbao(08),
|
||
# bp-keycloak(09), and bp-cnpg(16) Ready=True regardless of slot order:
|
||
# - bp-openbao: backs the catalyst-newapi-admin-token ExternalSecret
|
||
# consumed by unified-rbac (ADR-0003 §3.2 + §6).
|
||
# - bp-keycloak: OIDC issuer for the ops-staff admin UI.
|
||
# - bp-cnpg: Postgres for users/credits/channels/audit (claim-driven).
|
||
- slot: 80
|
||
name: bp-newapi
|
||
depends_on: [bp-openbao, bp-keycloak, bp-cnpg]
|
||
wave: present
|