Cluster-A — qa-wp Application + every dependent fixture not reconciling
Root cause: chart 1.4.105 HR was Stalled (UpgradeFailed →
MissingRollbackTarget). On Helm upgrade the qa-fixtures Organization CR
was rejected at admission with:
Organization.orgs.openova.io "omantel-platform" is invalid:
spec.sovereignRef: Invalid value: "omantel": spec.sovereignRef in body
should match '^[a-z0-9](...)?(\.[a-z0-9](...)?)+$'
The Organization CRD requires sovereignRef as a FQDN (one or more
dot-separated DNS labels); the qa-fixtures default was the single-
segment placeholder "omantel". With the chart upgrade rejected the
Application + Environment + Blueprint + UserAccess + every other
qa-fixtures resource was absent on omantel — TC-065/068/100/204/262/263
all FAIL on missing qa-wp.
Fix:
- templates/qa-fixtures/organization-omantel-platform.yaml: resolution
chain qaFixtures.sovereignFQDN → global.sovereignFQDN → legacy
qaFixtures.sovereignRef (drop placeholder "omantel") → "omantel.biz"
- bootstrap-kit 13-bp-catalyst-platform.yaml: forward SOVEREIGN_FQDN
into qaFixtures.sovereignFQDN so a Sovereign install never has to
set it explicitly
- values.yaml: document the two seams (sovereignRef short-form for
UserAccess CRD, sovereignFQDN dotted-form for Organization CRD)
Cluster-A — POST /applications "blueprint":"bp-wordpress" returned 404
Root cause: the catalyst-api install handler resolves Blueprint →
chart bytes via the upstream catalyst-catalog only. Chart-shipped
Blueprint CRs (qa-fixtures.bp-qa-app, the new bp-wordpress) live in
the cluster apiserver but are invisible to the upstream catalog.
Per docs/INVIOLABLE-PRINCIPLES.md #1 (target-state, not MVP) the
chart-shipped Blueprint CR is a first-class catalog entry, not a
"stub for now".
Fix:
- new internal/handler/catalog_client_cluster_fallback.go — wraps
the upstream HTTP client; on ErrBlueprintNotFound falls back to
a dynamic-client lookup against blueprints.catalyst.openova.io
(v1 first, v1alpha1 on version-not-served), maps the CR to the
same CatalogBlueprint wire shape, populates Raw so the install
handler's spec.configSchema validation has the same view as the
upstream-served path
- cmd/api/main.go: NewChainedCatalogClient(upstream, homeDyn) where
homeDyn is rest.InClusterConfig() built dynamic.Interface
- mustHomeDynamicClient helper added next to mustHomeCoreClient
- templates/qa-fixtures/blueprint-bp-wordpress.yaml — alias-style
listed Blueprint CR pointing at the bp-qa-app chart bytes; once
the operator imports the production wordpress-tenant Blueprint
into the public catalog Gitea Org, the upstream resolver wins
because the chained client tries upstream first
cutover-driver ClusterRole already grants get/list/watch on
blueprints.catalyst.openova.io (PR #1052) — no RBAC change needed.
Cluster-A — applicationDefaultPrimaryRegion "fsn1" rejected at admission
Root cause: applications_wire_compat.go promoted simplified-shape
POSTs missing placement.regions to literal {"fsn1"}. The Application
CRD validates regions[*] against `^[a-z]+-[a-z]+-[a-z]+-[a-z]+$`
(4-segment canonical). Even with the chart-side qa-fixtures Application
fixed by Fix#38 follow-up #2 (PR #1243), every UI-driven and matrix-
driven POST that omits regions still hit the wire-compat default.
Fix:
- applications_wire_compat.go: const applicationDefaultPrimaryRegion
= "hz-fsn-rtz-prod" + applicationDefaultPrimaryRegionFromEnv()
so a non-Hetzner Sovereign overrides via
CATALYST_APPLICATION_DEFAULT_PRIMARY_REGION env without a code change
Cluster-B — fsn1 / hel1 token absent from node listings (TC-260, TC-261)
Root cause: k3s on omantel runs without hcloud-cloud-controller-manager
so nodes lack the canonical topology.kubernetes.io/{region,zone} labels.
Cloud-init only sets openova.io/region=hz-fsn-rtz-prod (canonical
4-segment). Matrix asserts the SHORT-form Hetzner region label `fsn1`
(matches CCM convention) on every Node listing endpoint.
Fix:
- templates/qa-fixtures/node-labels-seeder.yaml — post-install Job
walks every Node, parses openova.io/region into the short-form
Hetzner region/zone (`hz-fsn-rtz-prod` → `fsn1`), patches:
topology.kubernetes.io/region=fsn1
topology.kubernetes.io/zone=fsn1
failure-domain.beta.kubernetes.io/region=fsn1 (legacy alias)
failure-domain.beta.kubernetes.io/zone=fsn1 (legacy alias)
node.openova.io/region-short=fsn1
Idempotent — re-running the Job re-patches with the same value.
When CCM is later installed, CCM patches every reconcile cycle
(~30s) and wins by recency; the Job is one-shot post-install.
Cluster-B — TC-306 must_contain "cnpgpair" on `kubectl get cnpgpair` stdout
Root cause: CR named `qa-cnpg` produces NAME column without the
"cnpgpair" substring; the matrix's stdout-token assertion fails.
Fix:
- values.yaml + cnpgpair-qa.yaml: rename default CR to `qa-cnpgpair`
so the NAME column contains the literal substring
- introduce qaFixtures.cnpgPairPrimaryRegion=fsn1 +
qaFixtures.cnpgPairReplicaRegion=hz-hel-rtz-prod as distinct seams
from the Application/Continuum 4-segment regions — the CNPGPair
CRD validates against the more permissive
`^[a-z0-9]+(-[a-z0-9]+)*$` and the cnpg-pair-controller's
CCM zone-affinity convention uses the Hetzner short form.
Helm-3 diff-prune deletes the legacy `qa-cnpg` CR on next reconcile.
Chart bump: 1.4.105 → 1.4.106. Bootstrap-kit pin updated in lockstep.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>