openova/scripts/expected-bootstrap-deps.yaml
e3mrah 20b3c5258a
feat(bp-newapi): chart maturation + first-otech deploy + Qwen vLLM channel (#799) (#812)
* feat(bp-newapi): chart maturation — ExternalSecret + first-otech vLLM channel + skip-render gates (#799)

Maturation work for the SME-3 turnkey-experience epic (#795). Aligns
the bp-newapi scratch chart with ADR-0003 (RBAC ↔ NewAPI user-create
hook contract) and gets it past the blueprint-release CI smoke render
that has blocked publication since PR #396 (run 25213444992 failed at
default-values render of v1.0.0).

Changes
-------
- templates/external-secret.yaml (NEW). Renders the
  `catalyst-newapi-admin-token` ExternalSecret consumed by unified-rbac
  (ADR-0003 §3.2 + §6) for issuing per-user keys against
  `http://newapi.newapi.svc/api/v1/admin/users`. Sourced from OpenBao
  via the `vault-region1` ClusterSecretStore (canonical default shipped
  by bp-external-secrets-stores). Capabilities-gated on
  `external-secrets.io/v1beta1` so cold installs without ESO don't
  fail-render. Operator supplies the per-Sovereign OpenBao path via
  `catalystIntegration.externalSecret.remoteRef.key`; canonical
  convention is `sovereign/<sovereign-fqdn>/newapi/admin-token` with
  property `ADMIN_API_TOKEN`. Per Inviolable Principle #4 every knob
  is operator-overridable in the cluster overlay.

- values.yaml. Adds `catalystIntegration.externalSecret.{enabled,
  refreshInterval, secretStoreRef.{kind,name}, remoteRef.{key,property}}`
  block (default enabled=true, key="" so a misconfigured overlay fails
  loudly at render rather than silently skipping). Adds
  `defaultChannels.vllm` block — first-otech shorthand that composes a
  vLLM-typed channel into the rendered channels list when enabled.
  Default endpoint is empty per Inviolable Principle #4; the
  `clusters/<sovereign>/bootstrap-kit/80-newapi.yaml` overlay supplies
  the per-Sovereign URL (canonical first-otech reference =
  `https://llm-api.omtd.bankdhofar.com` model `qwen3-coder`, the same
  upstream Axon uses on the OpenOva marketing deployment).

- templates/_helpers.tpl. New `bp-newapi.effectiveChannels` helper
  composes `.Values.channels` with `defaultChannels.vllm` (when
  enabled). The `assertChannelAttestation` helper now operates on the
  effective list so attestation gates apply to defaultChannels
  composition too. `defaultChannels.vllm.enabled=true` with empty
  endpoint fails-fast at render with a guided error message.

- templates/configmap.yaml. Channels rendering switches to the
  effectiveChannels helper. OIDC block now skip-renders gracefully when
  `auth.adminUI.keycloak.issuer` is unset (smoke-render path) instead
  of `required`-failing; the per-Sovereign overlay sets the issuer.

- templates/deployment.yaml. Skip-render gate on Deployment when
  `database.existingSecret`, `credentials.existingSecret`, or (when
  Keycloak mode is selected) the OIDC client secret is missing. Removes
  the four `required` calls that were failing CI smoke render. Service,
  ServiceAccount, ConfigMap, NetworkPolicy still render so the smoke
  test gets a non-empty output proving structural soundness; the actual
  Deployment defers until the per-Sovereign overlay wires the secrets.

- templates/ingress.yaml. Same skip-render pattern: when either
  `ingress.host` or `ingress.adminHost` is empty, the entire ingress
  block is silently skipped. Matches the bp-keycloak / bp-openbao /
  bp-external-dns HTTPRoute templates.

- Chart.yaml. version 1.0.0 → 1.1.0 (minor bump — additive features;
  no breaking changes to existing operator overrides).

Verification
------------
`helm template` smoke render on default values now succeeds with 4
resources (NetworkPolicy / ServiceAccount / ConfigMap / Service); 168
lines, well above the CI 5-line minimum. With a full per-Sovereign
overlay (hosts + secrets + Keycloak issuer + ESO Capabilities + Traefik
Capabilities + defaultChannels.vllm.endpoint), 8 resources render
including Deployment, both Ingresses, the Traefik allowlist Middleware,
and the ExternalSecret. The composed qwen channel writes through to
`channels.yaml` with the expected endpoint + models + attestation.

Refs
----
ADR-0003 §3.2 + §6 — admin-token contract
Issue #795 (epic) — locked decisions
Issue #796 — hook contract spec (sequential blocker, merged)
Inviolable Principles #1, #3, #4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(bootstrap-kit): slot 80 — bp-newapi default install (#799)

Adds the canonical install slot for bp-newapi to every fresh Sovereign's
bootstrap-kit. Sequenced after the W2.K1 dependency wave so NewAPI's
ExternalSecret + Postgres DSN dependencies resolve on first reconcile.

The HelmRelease declares `dependsOn: [bp-openbao, bp-keycloak, bp-cnpg]`:
- bp-openbao(08): admin-token ExternalSecret backend
- bp-keycloak(09): OIDC issuer for ops-staff admin UI at admin.<fqdn>
- bp-cnpg(16): Postgres backing for users/credits/channels/audit

Per-Sovereign overlays inherit the slot's defaults and override:
- ingress.host                                        api.${SOVEREIGN_FQDN}
- ingress.adminHost                                   admin.${SOVEREIGN_FQDN}
- auth.adminUI.keycloak.issuer
- database.existingSecret                             (Crossplane-claimed)
- credentials.existingSecret
- catalystIntegration.externalSecret.remoteRef.key    sovereign/${FQDN}/newapi/admin-token
- defaultChannels.vllm.enabled                        true (first-otech)
- defaultChannels.vllm.endpoint                       (operator-supplied)

The `_template/` slot keeps `defaultChannels.vllm.enabled: false` so a
fresh Sovereign does not silently wire customers to a third-party
endpoint; the canonical first-otech reference (Qwen3 Coder via
`https://llm-api.omtd.bankdhofar.com`, same relay Axon uses on the
OpenOva marketing deployment) is documented in-line for operators
adopting the same upstream.

Refs: #795 (epic), ADR-0003

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bootstrap-deps): register bp-newapi slot 80 in expected DAG (#799)

Fixes the dependency-graph-audit drift detection caught at PR #812 CI:
the audit script enumerates HelmReleases in clusters/_template/bootstrap-kit/
and compares to scripts/expected-bootstrap-deps.yaml; an HR present on
disk but absent from the expected DAG is treated as drift.

Adds the canonical entry for bp-newapi at slot 80 with the same
depends_on set declared on the HelmRelease itself
([bp-openbao, bp-keycloak, bp-cnpg]).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bp-newapi): align blueprint.yaml spec.version with Chart.yaml (#799)

The TestBootstrapKit_BlueprintCardsHaveRequiredFields static-validation
gate asserts Chart.yaml version == blueprint.yaml spec.version. The
chart was bumped to 1.1.0 in c63ecd8c; bumping the blueprint metadata
to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:17:25 +04:00

337 lines
12 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Expected dependency DAG for clusters/_template/bootstrap-kit/*.yaml
#
# Authoritative spec: docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.
# Consumed by: scripts/check-bootstrap-deps.sh
# Updated by: W2.K0 (slots 01-14 baseline + slots 15-48 forward declarations)
# W2.K1, K2, K3, K4 PRs add the corresponding HR files; this
# file already declares the expected deps for those slots so
# each W2 PR can be mechanically verified at merge time.
#
# Schema:
# slots:
# - slot: <int> # numeric prefix on the HR file (01..48)
# name: <string> # value of metadata.name on the HelmRelease
# depends_on: [<string>] # ordered or unordered; comparison is set-based
# wave: <"present"|"W2.K1"|"W2.K2"|"W2.K3"|"W2.K4">
#
# Comparison semantics enforced by check-bootstrap-deps.sh:
# - Each HR file present on disk MUST declare exactly the depends_on set listed
# here (missing edges -> error, extra edges -> error).
# - HRs declared here but not yet present on disk are reported as "deferred"
# (info, not an error) so that this file can be the static authoritative list
# while W2.K1..K4 land their HR files in series.
# - The graph is checked for cycles after merging declared+actual edges.
#
# The slot-numbering convention is documented in BOOTSTRAP-KIT-EXPANSION-PLAN.md §3.
slots:
# ---- Tier 0-4: present today (post-PR-247 baseline) -----------------------
- slot: 1
name: bp-cilium
depends_on: []
wave: present
- slot: 1a
name: bp-gateway-api
# Upstream Kubernetes Gateway API CRDs (Standard channel — issue #503).
# Cilium 1.16's `gatewayAPI.enabled=true` enables the controller but does
# NOT install the gateway.networking.k8s.io CRDs themselves; without them
# every chart that ships HTTPRoute templates (bp-keycloak / bp-gitea /
# bp-powerdns / bp-openbao / bp-harbor / bp-grafana / bp-catalyst-platform)
# fails install with `no matches for kind HTTPRoute`. Same split-CRD
# pattern as bp-crossplane-claims and bp-external-secrets-stores.
depends_on: [bp-cilium]
wave: present
- slot: 2
name: bp-cert-manager
depends_on: [bp-cilium]
wave: present
- slot: 3
name: bp-flux
depends_on: [bp-cert-manager]
wave: present
- slot: 4
name: bp-crossplane
depends_on: [bp-flux]
wave: present
- slot: 5
name: bp-sealed-secrets
depends_on: [bp-cert-manager]
wave: present
- slot: "5a"
name: bp-reflector
# emberstack/reflector — secret/configmap mirror controller (issue #543).
# Propagates ghcr-pull secret to every namespace so cross-namespace
# ImagePullBackOff gaps are eliminated. Slot 5a: after sealed-secrets,
# before spire. dependsOn bp-cert-manager (CRDs must exist).
# Used by bp-gitea + bp-harbor to propagate CNPG-generated pg-app Secrets.
depends_on: [bp-cert-manager]
wave: present
- slot: 7
name: bp-nats-jetstream
depends_on: []
wave: present
- slot: 8
name: bp-openbao
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template;
# gateway.networking.k8s.io/v1 CRDs must be registered before install.
# bp-cnpg dep (issue #512): post-install init hook (`bao operator init`)
# races cnpg readiness on a fresh Sovereign, hitting the 15m install
# timeout. Explicit dep makes Flux wait for cnpg Ready=True first.
depends_on: [bp-gateway-api, bp-cnpg]
wave: present
- slot: 9
name: bp-keycloak
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
depends_on: [bp-cert-manager, bp-gateway-api]
wave: present
- slot: 10
name: bp-gitea
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
# bp-cnpg dep (issue #584): chart ships a CNPG Cluster CR; postgresql.cnpg.io/v1
# CRD must be registered before bp-gitea applies so Capabilities gate fires.
depends_on: [bp-keycloak, bp-gateway-api, bp-cnpg]
wave: present
- slot: 11
name: bp-powerdns
# bp-gateway-api dep (issue #503): chart ships an api-httproute.yaml template.
# bp-cnpg dep: chart's templates/cnpg-cluster.yaml renders a
# postgresql.cnpg.io/v1.Cluster gated on Capabilities.APIVersions.
# Without this dep Helm renders before the CRD is registered, the
# gate evaluates false, the Cluster CR is silently skipped, CNPG
# never creates pdns-pg-app, and powerdns Pods fail at boot with
# "secret pdns-pg-app not found" (caught live during otech28).
depends_on: [bp-cert-manager, bp-gateway-api, bp-cnpg]
wave: present
- slot: 12
name: bp-external-dns
# bp-reflector dep (issue #543): external-dns HTTPRoute uses reflector-mirrored
# ghcr-pull secret; reflector must be Ready before external-dns deploys.
depends_on: [bp-cert-manager, bp-powerdns, bp-reflector]
wave: present
- slot: 13
name: bp-catalyst-platform
# bp-gateway-api dep (issue #503): umbrella chart ships catalyst-ui +
# catalyst-api HTTPRoute templates.
# bp-keycloak + bp-cnpg deps (issue #512): umbrella post-install Jobs
# bootstrap OIDC clients + seed PG schemas; both deps take 5+ min to
# reach Ready on a fresh Sovereign, racing the 15m install timeout.
# Explicit deps make Flux wait for both Ready=True before umbrella starts.
depends_on: [bp-gitea, bp-gateway-api, bp-keycloak, bp-cnpg]
wave: present
- slot: 14
name: bp-crossplane-claims
depends_on: [bp-crossplane]
wave: present
# ---- Tier 5: storage + DB (W2.K1, slots 15-19) ----------------------------
- slot: 15
name: bp-external-secrets
depends_on: [bp-openbao, bp-cert-manager]
wave: W2.K1
- slot: 15a
name: bp-external-secrets-stores
# Default ClusterSecretStore CR(s). Split from bp-external-secrets@1.0.0
# at PR #334 (issue #331) to resolve CRD-ordering deadlock —
# ClusterSecretStore CR cannot live in the same Helm release as the ESO
# subchart that registers its CRD. Mirrors bp-crossplane ↔
# bp-crossplane-claims pattern.
depends_on: [bp-external-secrets, bp-openbao]
wave: W2.K1
- slot: 16
name: bp-cnpg
depends_on: [bp-flux]
wave: W2.K1
- slot: 17
name: bp-valkey
depends_on: [bp-flux]
wave: W2.K1
- slot: 18
name: bp-seaweedfs
depends_on: [bp-flux, bp-cert-manager]
wave: W2.K1
- slot: 19
name: bp-harbor
# bp-seaweedfs dependency REMOVED per ADR-0001 §13 (cloud-direct).
# Harbor on Sovereigns writes blobs directly to cloud Object Storage
# (Hetzner / R2 / S3 / Azure / GCS), not via SeaweedFS. See
# clusters/_template/bootstrap-kit/19-harbor.yaml lines 35-37.
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template;
# gateway.networking.k8s.io/v1 CRDs must be registered first.
depends_on: [bp-cnpg, bp-cert-manager, bp-gateway-api]
wave: W2.K1
- slot: 6a
name: bp-self-sovereign-cutover
# Post-handover self-sovereignty cutover (issue #791). Filename
# carries the 06a- prefix to colocate cohorts visually but the slot
# depends on bp-gitea + bp-harbor and therefore actually installs
# AFTER both. Chart ships dormant — catalyst-api stamps Jobs from
# the chart's PodSpec ConfigMaps only on operator-driven trigger.
depends_on: [bp-gitea, bp-harbor]
wave: W2.K1
# ---- Tier 6: observability (W2.K2, slots 20-26) ---------------------------
- slot: 20
name: bp-opentelemetry
depends_on: [bp-cert-manager]
wave: W2.K2
- slot: 21
name: bp-alloy
depends_on: [bp-opentelemetry]
wave: W2.K2
- slot: 22
name: bp-loki
depends_on: [bp-seaweedfs]
wave: W2.K2
- slot: 23
name: bp-mimir
depends_on: [bp-seaweedfs]
wave: W2.K2
- slot: 24
name: bp-tempo
depends_on: [bp-seaweedfs]
wave: W2.K2
- slot: 25
name: bp-grafana
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
depends_on: [bp-cnpg, bp-loki, bp-mimir, bp-tempo, bp-keycloak, bp-gateway-api]
wave: W2.K2
# ---- Tier 7: security + policy (W2.K3, slots 27-34) -----------------------
- slot: 27
name: bp-kyverno
depends_on: [bp-cilium]
wave: W2.K3
- slot: 28
name: bp-reloader
depends_on: []
wave: W2.K3
- slot: 29
name: bp-vpa
depends_on: []
wave: W2.K3
- slot: 30
name: bp-trivy
depends_on: [bp-cert-manager]
wave: W2.K3
- slot: 31
name: bp-falco
depends_on: [bp-cilium]
wave: W2.K3
- slot: 32
name: bp-sigstore
depends_on: [bp-cert-manager]
wave: W2.K3
- slot: 33
name: bp-syft-grype
depends_on: [bp-cert-manager]
wave: W2.K3
- slot: 34
name: bp-velero
# No dependsOn — Velero on Hetzner Sovereigns writes DIRECTLY to
# Hetzner Object Storage per ADR-0001 §13 + WBS §3 (S3-aware app
# rule). The previous SeaweedFS dependency was retired in #384;
# Velero's BackupStorageLocation now consumes flux-system/hetzner-
# object-storage Secret (issue #371) via Flux valuesFrom, populated
# at HelmRelease apply time — no in-cluster prerequisite Blueprint.
depends_on: []
wave: W2.K3
# ---- Tier 8 + 9: edge + apps + AI runtime (W2.K4, slots 35-48) ------------
- slot: 35
name: bp-coraza
depends_on: [bp-cilium, bp-cert-manager]
wave: W2.K4
- slot: 36
name: bp-stunner
depends_on: [bp-cilium, bp-cert-manager]
wave: W2.K4
- slot: 37
name: bp-knative
depends_on: [bp-cert-manager]
wave: W2.K4
- slot: 38
name: bp-kserve
depends_on: [bp-knative]
wave: W2.K4
- slot: 39
name: bp-vllm
depends_on: [bp-kserve]
wave: W2.K4
- slot: 40
name: bp-llm-gateway
depends_on: [bp-cnpg, bp-keycloak]
wave: W2.K4
- slot: 41
name: bp-anthropic-adapter
depends_on: [bp-llm-gateway]
wave: W2.K4
- slot: 42
name: bp-bge
depends_on: [bp-cnpg]
wave: W2.K4
- slot: 43
name: bp-nemo-guardrails
depends_on: [bp-llm-gateway, bp-bge, bp-cnpg]
wave: W2.K4
- slot: 44
name: bp-temporal
depends_on: [bp-cnpg, bp-cert-manager]
wave: W2.K4
- slot: 45
name: bp-openmeter
depends_on: [bp-cnpg, bp-nats-jetstream]
wave: W2.K4
- slot: 46
name: bp-livekit
depends_on: [bp-stunner, bp-cert-manager]
wave: W2.K4
- slot: 47
name: bp-matrix
depends_on: [bp-cnpg, bp-keycloak, bp-cert-manager]
wave: W2.K4
- slot: 48
name: bp-librechat
depends_on: [bp-llm-gateway, bp-vllm, bp-bge, bp-keycloak]
wave: W2.K4
# ---- Slot 49 — DNS-01 wildcard TLS solver against contabo's central PowerDNS
# Authored under #373; lands at slot 49 because slots 36-48 were already
# forward-declared by the W2.K4 batch. Re-targeted from per-Sovereign
# PowerDNS to contabo central PowerDNS (https://pdns.openova.io) because
# omani.works is delegated from Dynadot to ns1/2/3.openova.io which run
# on contabo PowerDNS — the Sovereign's own PowerDNS is not on the
# public DNS chain until pool-domain-manager seals the per-Sovereign
# NS delegation. Caught live on otech4346. Slot 49b
# (bp-cert-manager-dynadot-webhook) was dropped in the same PR
# (Dynadot is NOT the API-level authority for omani.works subdomains).
- slot: 49
name: bp-cert-manager-powerdns-webhook
depends_on: [bp-cert-manager]
wave: present
# ---- Slot 50 — Cluster Autoscaler (Hetzner). Issue #767.
# Adds/removes Hetzner workers in response to FailedScheduling events,
# bounded by per-Sovereign min/max node-group config the operator picks
# at launch. Hetzner token wired from flux-system/cloud-credentials —
# the same Secret Crossplane provider-hcloud reads, so no sibling-
# blueprint dep at install time. Lands AFTER slot 49 (the existing
# forward-declared cohort fills slots 36-49) to avoid colliding with
# the W2.K4 numbering plan.
- slot: 50
name: bp-cluster-autoscaler-hcloud
depends_on: []
wave: present
# ---- Slot 80 — bp-newapi multi-tenant LLM marketplace gateway. Issue #799.
# Sequenced past the W2.K4 numbering plan (slots 36-48) so it never
# collides with the AI-runtime / observability / livekit cohort. The
# HelmRelease's dependsOn pins install order to AFTER bp-openbao(08),
# bp-keycloak(09), and bp-cnpg(16) Ready=True regardless of slot order:
# - bp-openbao: backs the catalyst-newapi-admin-token ExternalSecret
# consumed by unified-rbac (ADR-0003 §3.2 + §6).
# - bp-keycloak: OIDC issuer for the ops-staff admin UI.
# - bp-cnpg: Postgres for users/credits/channels/audit (claim-driven).
- slot: 80
name: bp-newapi
depends_on: [bp-openbao, bp-keycloak, bp-cnpg]
wave: present