fix(catalyst-platform): hoist parent_domains_listeners YAML out of cloud-init (Closes #2118) (#2119)

The Cilium Gateway listener block (cilium-gateway.yaml `spec.listeners:
${PARENT_DOMAINS_LISTENERS_YAML}`) was materialised in
infra/hetzner/main.tf (locals.parent_domains_listeners_yaml) and inlined
as a postBuild.substitute value on the sovereign-tls Kustomization in
cloud-init. That value scaled O(N) with parent-zone count and pushed
4-zone SME-pool Sovereigns over Hetzner's 32,256-byte user_data
guardrail. t39 audit (agent-a2c1647c, 2026-05-20): omantel.biz +
.omani.{homes,rest,trade} cloud-init rendered to 33,656 bytes (+1,400
overshoot) and the create call failed at the tofu precondition.

Fix: render the listener YAML inside the bp-catalyst-platform chart at
templates/sovereign-tls-vars-cm.yaml from .Values.parentZones (same
input the chart's per-zone Certificate render already consumes). The
template emits a ConfigMap `flux-system/sovereign-tls-vars` whose key
PARENT_DOMAINS_LISTENERS_YAML carries the JSON-flow listener array.
Cloud-init's sovereign-tls Kustomization reads it via Flux
`postBuild.substituteFrom: [{kind: ConfigMap, name: sovereign-tls-vars}]`.
Ordering is preserved — sovereign-tls `dependsOn: bootstrap-kit Ready`
and bp-catalyst-platform is inside bootstrap-kit, so the ConfigMap
exists in etcd by the time Flux reconciles sovereign-tls.

Synthetic render evidence (standalone Tofu harness, t39's 4-zone +
realistic 5,468-byte worker_cloud_init_b64):
  - BEFORE (origin/main 6c1444b4c): cloud-init stripped 30,748 bytes
  - AFTER  (this commit):           cloud-init stripped 28,619 bytes
  - savings:                        2,129 bytes (>1,400 overshoot fix)

Helm-template covers every historical path preserved by the removed
Tofu locals:
  - Single-zone fallback (parentZones empty → list with sovereign FQDN)
    emits bare `https`/`http` listener names — every catalyst-system
    HTTPRoute hardcodes `sectionName: https` on single-zone Sovereigns.
  - Multi-zone (SME pool) emits unique `https-<sanitised>` /
    `http-<sanitised>` names per parent zone (otherwise the Gateway
    controller raises a Conflicting status condition on duplicate
    listener names).
  - TBD-A32 #1886 per-prov 2-label wildcard listener pair
    (`*.<sovereignFQDN>` with per-prov cert) appended when
    sovereignFQDN ∉ parentZones; skipped on the legacy single-zone-
    on-apex case to avoid a duplicate-name Conflict.
  - Catalyst-Zero (contabo, empty global.sovereignFQDN) skips the
    template via top-level guard — Kustomize build untouched.

Cross-region: every region runs the same chart, each peer renders its
own ConfigMap into its own flux-system, so each region's sovereign-tls
Kustomization reads locally.

Lockstep bumps in this commit:
  - products/catalyst/chart/Chart.yaml      1.4.231 → 1.4.232
  - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
    HelmRelease spec.chart.spec.version    1.4.231 → 1.4.232

Removed (kept the rationale comments as migration breadcrumbs):
  - infra/hetzner/main.tf locals.parent_domains_listeners_yaml
  - infra/hetzner/main.tf locals.per_prov_listeners
  - infra/hetzner/main.tf locals.parent_domains_includes_sovereign_fqdn
  - templatefile() var `parent_domains_listeners_yaml` on both the
    primary CP and each per-secondary-region CP invocation
  - PARENT_DOMAINS_LISTENERS_YAML substitute key on the sovereign-tls
    Flux Kustomization (cloud-init) — replaced by substituteFrom

Doc-only updates: parent_domains.go log message + cilium-gateway.yaml
header + sandbox manifests.go + values.yaml header point at the new
ConfigMap-vars path for Day-2 add-domain workflows.

`tofu validate` + `helm lint` + `helm template` (single-zone fallback,
multi-zone with per-prov pair, collision case `fqdn ∈ parentZones`, and
Catalyst-Zero empty-fqdn) all clean.

Refs #2118

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-20 23:50:59 +04:00 committed by GitHub
parent 6c1444b4c1
commit 4be414551d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 300 additions and 181 deletions

View File

@ -840,7 +840,13 @@ spec:
# resource-detail page's k8s SSE subscription to include the
# `event` kind so the EventsPanel surfaces live K8s Events
# instead of perpetually rendering empty-state.
version: 1.4.231
# 1.4.232 — #2118 (TBD-V48): render Cilium Gateway listener YAML
# in the chart (templates/sovereign-tls-vars-cm.yaml) so cloud-init
# no longer carries the O(N)-per-parent-zone listener block.
# Removes ~2.1 KiB from cloud-init on 4-zone SME-pool Sovereigns
# (omantel.biz + .omani.{homes,rest,trade}); brings t39-class
# provisions back under Hetzner's 32 KiB user_data cap.
version: 1.4.232
sourceRef:
kind: HelmRepository
name: bp-catalyst-platform

View File

@ -23,15 +23,24 @@
# NET::ERR_CERT_COMMON_NAME_INVALID, marketplace WordPress tenants on
# omani.homes are unreachable.
#
# Fix: render one listener pair per parent zone. The listener block is
# materialised at Terraform plan time (infra/hetzner/main.tf
# locals.parent_domains_listeners_yaml — jsonencode of the listener
# objects), threaded through Flux postBuild.substitute as
# ${PARENT_DOMAINS_LISTENERS_YAML}, and consumed BELOW as a YAML inline-
# flow array value on `spec.listeners`. Each pair's certificateRefs
# target the per-zone Secret rendered by products/catalyst/chart/
# templates/sovereign-wildcard-certs.yaml (PR #827) so the Gateway
# listener and the cert resource are always in lockstep.
# Fix: render one listener pair per parent zone. As of #2118 (TBD-V48,
# 2026-05-20) the listener block is rendered inside bp-catalyst-platform's
# templates/sovereign-tls-vars-cm.yaml from .Values.parentZones (the
# chart's single source of truth on parent-zone shape). The chart emits
# a ConfigMap flux-system/sovereign-tls-vars whose key
# PARENT_DOMAINS_LISTENERS_YAML carries the JSON-flow listener array.
# Cloud-init's sovereign-tls Kustomization reads it via
# `postBuild.substituteFrom: [{kind: ConfigMap, name: sovereign-tls-vars}]`
# and Flux inlines the value at `${PARENT_DOMAINS_LISTENERS_YAML}` below.
# Each pair's certificateRefs target the per-prov Secret rendered by
# clusters/_template/sovereign-tls/cilium-gateway-cert.yaml (TBD-A29
# #1883) — the listener and the cert resource stay in lockstep.
#
# Why moved out of cloud-init: the inline value scaled O(N) with parent-
# zone count and pushed 4-zone SME-pool Sovereigns over Hetzner's 32 KiB
# user_data cap (t39 audit, 2026-05-20: 33,656 bytes overshot the post-
# #1985 guardrail of 32,256). Render-in-chart drops ~2.1 KiB out of
# cloud-init with safety margin.
#
# Why a scalar placeholder, not a multi-line block:
# - kustomize-build PARSES the YAML before Flux runs envsubst. A
@ -100,10 +109,13 @@
# sectionName (PR #1888 closing #1884) — Cilium attaches by hostname
# match.
#
# The listener block is rendered by infra/hetzner/main.tf locals.
# parent_domains_listeners_yaml using local.parent_domains_single_zone
# to switch between the two naming schemes (and appending per-prov
# listeners via local.per_prov_listeners).
# The listener block is rendered by bp-catalyst-platform's
# templates/sovereign-tls-vars-cm.yaml (Closes #2118 / TBD-V48), which
# encodes the single-zone vs multi-zone naming switch AND the per-prov
# pair-emission logic directly in Helm template (range over parentZones
# + ternary `single ? "https" : "https-<sanitised>"` + a $fqdnInZones
# collision check that skips the per-prov pair when sovereignFQDN
# already equals a declared parent-zone name).
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway

View File

@ -765,8 +765,9 @@ spec:
# already use. sectionName is intentionally omitted so the HTTPRoute
# attaches to every listener whose hostname matches "sandbox.<sov-fqdn>"
# currently the wildcard *.${SOVEREIGN_FQDN} HTTPS listener
# (https-<sov-fqdn-dashed>) per infra/hetzner/main.tf
# locals.parent_domains_listeners_yaml fallback path.
# (https-<sov-fqdn-dashed>) emitted by bp-catalyst-platform's
# templates/sovereign-tls-vars-cm.yaml per-prov listener pair (Closes
# #2118 / TBD-V48; formerly infra/hetzner/main.tf locals.per_prov_listeners).
parentRefs:
- name: cilium-gateway
namespace: kube-system

View File

@ -1314,22 +1314,19 @@ write_files:
# bp-catalyst-platform into clusters/_template/sovereign-tls/
# has access to the parent-zone list without a config copy.
PARENT_DOMAINS_YAML: '${parent_domains_yaml}'
# PARENT_DOMAINS_LISTENERS_YAML (issue #831 follow-on to #827).
# JSON-flow array literal listing one Gateway listener pair
# (HTTPS:30443 + HTTP:30080) per parent zone. Consumed as a
# scalar value at `listeners: $${PARENT_DOMAINS_LISTENERS_YAML}`
# in clusters/_template/sovereign-tls/cilium-gateway.yaml.
# kustomize-build accepts the unsubstituted scalar; Flux's
# postBuild.substitute then swaps it for the materialised
# array, which YAML parses as the actual listener list.
# The double jsonencode is intentional — the inner one
# (locals.parent_domains_listeners_yaml) renders the array;
# the outer one wraps it as a JSON-encoded string so the
# value-in-YAML embedding works regardless of the array's
# internal characters. See infra/hetzner/main.tf
# locals.parent_domains_listeners_yaml for rationale +
# listener-naming convention.
PARENT_DOMAINS_LISTENERS_YAML: ${jsonencode(parent_domains_listeners_yaml)}
# PARENT_DOMAINS_LISTENERS_YAML — historically materialised here
# by infra/hetzner/main.tf locals.parent_domains_listeners_yaml
# and inlined as a substitute value, but that scaled O(N) with
# parent-zone count and overflowed Hetzner's 32 KiB user_data
# cap on 4-zone SME-pool Sovereigns (Closes #2118 — t39 audit,
# 2026-05-20). Now rendered inside bp-catalyst-platform's
# templates/sovereign-tls-vars-cm.yaml from .Values.parentZones
# (single source of truth — same input the chart's per-zone
# Certificate render already consumes). Picked up below via
# `substituteFrom: ConfigMap/sovereign-tls-vars`. Ordering is
# safe: this Kustomization `dependsOn: bootstrap-kit Ready`, and
# bootstrap-kit is Ready only when bp-catalyst-platform's HR
# (which renders the ConfigMap) is Ready.
# WILDCARD_CERT_ISSUER (Fix #176 — qa-loop iter-1 LE
# rate-limit unblock). cilium-gateway-cert.yaml references
# this via $${WILDCARD_CERT_ISSUER}. When
@ -1359,6 +1356,24 @@ write_files:
SOVEREIGN_FQDN_SLUG: "${sovereign_fqdn_slug}"
SOVEREIGN_REGION_KEY: ${sovereign_region_key}
HCLOUD_LB_LOCATION: "${region}"
# substituteFrom: ConfigMap/sovereign-tls-vars (Closes #2118).
# The bp-catalyst-platform chart's templates/sovereign-tls-vars-cm.yaml
# renders this ConfigMap from .Values.parentZones into flux-system.
# Keys it carries:
# - PARENT_DOMAINS_LISTENERS_YAML: JSON-flow listener array
# consumed by clusters/_template/sovereign-tls/cilium-gateway.yaml
# at `spec.listeners: $${PARENT_DOMAINS_LISTENERS_YAML}`.
# Moved out of the inline `substitute` map above to keep cloud-init
# under Hetzner's 32 KiB user_data cap on multi-zone SME-pool
# Sovereigns (the listener block scales O(N) with parent-zone
# count; 4 zones → ~2.2 KiB → cloud-init at 33.6 KiB before this fix).
# optional: false is correct — bp-catalyst-platform is INSIDE
# bootstrap-kit, and this Kustomization dependsOn bootstrap-kit
# Ready, so the ConfigMap is guaranteed to exist before reconcile.
substituteFrom:
- kind: ConfigMap
name: sovereign-tls-vars
optional: false
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization

View File

@ -420,132 +420,32 @@ locals {
# cert wiring under #831, not parent-zone wildcards.
# TBD-A32 (#1886) per-prov 2-label wildcard listener
# Closes #2118 (2026-05-20): the listener YAML was historically
# materialised here (locals.parent_domains_listeners_yaml,
# locals.per_prov_listeners, locals.parent_domains_includes_sovereign_fqdn)
# and threaded into cloud-init as an inline postBuild.substitute value.
# That scaled O(N) with parent-zone count and pushed cloud-init over
# Hetzner's 32 KiB user_data cap on 4-zone SME-pool Sovereigns (t39
# audit, 2026-05-20: 33,656 bytes for omantel.biz + 3-zone .omani.X).
#
# The parent-zone listeners above declare `hostname: *.<zone>` (e.g.
# `*.omani.works`). Per Gateway-API spec wildcard semantics, that pattern
# matches EXACTLY ONE label depth `foo.omani.works` but NOT
# `console.t28.omani.works` (2-label depth from the apex). On a
# multi-prov shared parent zone like `omani.works`, every per-prov
# operator endpoint (console.<fqdn>, api.<fqdn>, marketplace.<fqdn>, )
# is 2-label-deep, so the parent-zone wildcard listener never catches
# them and cilium-envoy returns TLS handshake reset / NoMatchingListener.
# The listener block is now rendered inside bp-catalyst-platform's
# templates/sovereign-tls-vars-cm.yaml from .Values.parentZones +
# .Values.global.sovereignFQDN same Helm template that already owns
# the per-zone Certificate render shape. The chart writes a ConfigMap
# `flux-system/sovereign-tls-vars` whose `PARENT_DOMAINS_LISTENERS_YAML`
# key is read by the sovereign-tls Kustomization via
# `postBuild.substituteFrom` (cloud-init writes that Kustomization).
#
# Note: TBD-A29 (#1883) already pointed the parent-zone listener's
# certificateRefs at the per-prov cert `sovereign-wildcard-tls-
# <fqdn-dashed>`. That fixed the LE-budget burn but NOT the hostname-
# match gap listener selection happens BEFORE SNI cert dispatch, so
# cilium-envoy never reaches the per-prov cert for a 2-label-deep
# request that the parent-zone listener rejects on hostname.
# The TBD-A32 collision guard (sovereign_fqdn parent_zone_names
# skip per-prov pair to avoid duplicate listener-name Conflict) is
# preserved in the Helm template as a `range` over parentZones that
# sets a `$fqdnInZones` boolean.
#
# Fix: emit an ADDITIONAL listener pair hostnamed `*.<sovereign_fqdn>`
# (e.g. `*.t28.omani.works`) bound to the SAME per-prov cert
# `sovereign-wildcard-tls-<fqdn-dashed>` rendered by
# clusters/_template/sovereign-tls/cilium-gateway-cert.yaml. That
# cert already enumerates 13 per-prov SANs (console / auth / gitea /
# harbor / registry / api / bao / grafana / hubble / pdns /
# openova-flow / guacamole / marketplace / sandbox) so every per-prov
# subdomain has both a listener match AND a matching cert SAN.
#
# Collision guard: when sovereign_fqdn is identical to one of the
# declared parent-zone names (legacy single-zone case where the
# operator brings the apex itself, e.g. parent_domains_yaml=
# `[{name: "omani.works"}]` and sovereign_fqdn=`omani.works`), the
# parent-zone listener already covers everything 1-label-deep and
# adding a duplicate `*.<fqdn>` pair would produce a Gateway
# Conflicted condition on duplicate listener names. Skip the per-prov
# pair in that case `local.parent_domains_includes_sovereign_fqdn`.
#
# Naming: the per-prov listener pair always uses unique names
# `https-<fqdn-dashed>` / `http-<fqdn-dashed>` (e.g. `https-t28-omani-
# works`). This is safe because every catalyst-system HTTPRoute now
# OMITS sectionName (PR #1888 closing #1884) Cilium attaches each
# route to the listener whose hostname matches via the hostname
# filter, not by sectionName equality.
parent_domains_includes_sovereign_fqdn = contains(
[for e in local.parent_domains_decoded : e.name],
var.sovereign_fqdn
)
# NOTE (TBD-A35 hotfix, Closes #1886): the conditional that suppresses
# this pair when sovereign_fqdn collides with a declared parent zone now
# lives on the consumer line in `parent_domains_listeners_yaml` below
# (concat() at line ~503). Keeping the conditional here as
# `... ? [] : [<HTTPS_obj>, <HTTP_obj>]` triggers tofu/terraform
# "Inconsistent conditional result types" the true arm is an empty
# tuple `tuple([])` while the false arm is `tuple([obj_with_tls,
# obj_without_tls])` and HCL cannot unify the two. Always emit the pair
# at this local; suppress at the consumer.
per_prov_listeners = [
{
name = format("https-%s", local.sovereign_fqdn_dashed)
port = 30443
protocol = "HTTPS"
hostname = format("*.%s", var.sovereign_fqdn)
tls = {
mode = "Terminate"
certificateRefs = [
{
kind = "Secret"
name = format("sovereign-wildcard-tls-%s", local.sovereign_fqdn_dashed)
}
]
}
allowedRoutes = {
namespaces = {
from = "All"
}
}
},
{
name = format("http-%s", local.sovereign_fqdn_dashed)
port = 30080
protocol = "HTTP"
hostname = format("*.%s", var.sovereign_fqdn)
allowedRoutes = {
namespaces = {
from = "All"
}
}
},
]
parent_domains_listeners_yaml = jsonencode(concat(
flatten([
for entry in local.parent_domains_decoded : [
{
name = local.parent_domains_single_zone ? "https" : format("https-%s", replace(entry.name, ".", "-"))
port = 30443
protocol = "HTTPS"
hostname = format("*.%s", entry.name)
tls = {
mode = "Terminate"
certificateRefs = [
{
kind = "Secret"
name = format("sovereign-wildcard-tls-%s", local.sovereign_fqdn_dashed)
}
]
}
allowedRoutes = {
namespaces = {
from = "All"
}
}
},
{
name = local.parent_domains_single_zone ? "http" : format("http-%s", replace(entry.name, ".", "-"))
port = 30080
protocol = "HTTP"
hostname = format("*.%s", entry.name)
allowedRoutes = {
namespaces = {
from = "All"
}
}
},
]
]),
[for l in local.per_prov_listeners : l if !local.parent_domains_includes_sovereign_fqdn]
))
# Single-zone fallback (legacy Sovereigns shipping parentZones empty)
# is preserved in the Helm template as a `if eq (len $zones) 0
# list (dict "name" $fqdn "role" "primary")` substitution matches
# the historical `coalesce(var.parent_domains_yaml, format("[{name:
# \"%s\", role: \"primary\"}]", var.sovereign_fqdn))` shape.
# Effective singular-path SKU selection (Fix #157)
# When qa_fixtures_enabled='true', the Sovereign is a QA-loop matrix
@ -808,12 +708,13 @@ locals {
var.parent_domains_yaml,
format("[{name: \"%s\", role: \"primary\"}]", var.sovereign_fqdn)
)
# Cilium Gateway listeners per parent zone (issue #831). Multi-line
# YAML block iterating local.parent_domains_decoded. Threaded into
# clusters/_template/sovereign-tls/cilium-gateway.yaml via Flux
# postBuild.substitute as ${PARENT_DOMAINS_LISTENERS_YAML}. See
# locals.parent_domains_listeners_yaml above for shape + rationale.
parent_domains_listeners_yaml = local.parent_domains_listeners_yaml
# Cilium Gateway listener YAML is no longer threaded into cloud-init
# (Closes #2118). The bp-catalyst-platform chart's
# templates/sovereign-tls-vars-cm.yaml renders the listener block
# from .Values.parentZones into a flux-system/sovereign-tls-vars
# ConfigMap; the sovereign-tls Kustomization's
# postBuild.substituteFrom picks it up. Keeps cloud-init under
# Hetzner's 32 KiB user_data cap on multi-zone SME-pool Sovereigns.
# sovereign_regions_json canonical multi-region RegionSpec[]
# JSON literal. Threaded into bp-catalyst-platform's
# .Values.sovereign.regionsJson via the bootstrap-kit slot 13
@ -1376,12 +1277,12 @@ locals {
var.parent_domains_yaml,
format("[{name: \"%s\", role: \"primary\"}]", var.sovereign_fqdn)
)
# Cilium Gateway listeners per parent zone (issue #831). Same
# rendered multi-line YAML as the primary CP secondary regions
# also reconcile sovereign-tls into THEIR own cluster, so the
# listeners block must be present there too. See
# locals.parent_domains_listeners_yaml in this file.
parent_domains_listeners_yaml = local.parent_domains_listeners_yaml
# Cilium Gateway listener YAML is no longer threaded into cloud-init
# (Closes #2118). Same bp-catalyst-platform chart runs in every
# region each peer's chart renders its own
# flux-system/sovereign-tls-vars ConfigMap and sovereign-tls reads
# it locally via postBuild.substituteFrom. See main.tf locals
# comment (~line 422) for full rationale.
# Same JSON-encoded RegionSpec[] as the primary CP every region's
# bp-catalyst-platform renders the same sovereign.regionsJson value
# (the cluster topology is Sovereign-wide, not per-region).

View File

@ -577,18 +577,25 @@ func (h *Handler) AddParentDomain(w http.ResponseWriter, r *http.Request) {
// next-prov. For an ALREADY-RUNNING Sovereign, the Hetzner
// hcloud_server resource has no `ignore_changes = [user_data]`
// so a `tofu apply` from changed cloud-init would request a
// destructive server recreate — the operator workaround is to
// `kubectl patch kustomization sovereign-tls -n flux-system`
// on the live Sovereign and append the new zone to the
// `.spec.postBuild.substitute.PARENT_DOMAINS_LISTENERS_YAML`
// value. Long-term: add a Day-2 listener-patch step here that
// reaches into the Sovereign apiserver via the persisted
// kubeconfig (out of scope for the #1772 ship).
h.log.Info("parent-domain post-add: operator must patch live Sovereign Kustomization to surface listener for the new zone",
// destructive server recreate.
//
// Closes #2118 (TBD-V48) changed the Day-2 patch target. The
// listener YAML was historically inlined into cloud-init's
// .spec.postBuild.substitute.PARENT_DOMAINS_LISTENERS_YAML on
// the sovereign-tls Kustomization, so operators patched that
// field on the live Sovereign. The chart now renders the
// listener YAML into ConfigMap/sovereign-tls-vars in flux-system
// and the Kustomization reads via postBuild.substituteFrom; the
// live-Sovereign Day-2 patch target is therefore the ConfigMap's
// data.PARENT_DOMAINS_LISTENERS_YAML key, NOT the Kustomization's
// inline substitute map. Long-term: add a Day-2 ConfigMap-patch
// step here that reaches into the Sovereign apiserver via the
// persisted kubeconfig (out of scope for the #1772 ship).
h.log.Info("parent-domain post-add: operator must patch live Sovereign ConfigMap to surface listener for the new zone",
"domain", req.Name,
"target", "Kustomization/sovereign-tls in flux-system on Sovereign",
"field", ".spec.postBuild.substitute.PARENT_DOMAINS_LISTENERS_YAML",
"reason", "hcloud_server user_data is not ignored — tofu apply would recreate the server. Fresh provs already render the listener.",
"target", "ConfigMap/sovereign-tls-vars in flux-system on Sovereign",
"field", ".data.PARENT_DOMAINS_LISTENERS_YAML",
"reason", "hcloud_server user_data is not ignored — tofu apply would recreate the server. Fresh provs already render the listener via the chart.",
)
writeJSON(w, http.StatusCreated, ParentDomain{
Name: name,

View File

@ -2235,8 +2235,29 @@ name: bp-catalyst-platform
#
# Refs #1099 (NOT Closes — operator walk + screenshot is the DoD per
# CLAUDE.md §0).
version: 1.4.231
appVersion: 1.4.231
version: 1.4.232
appVersion: 1.4.232
# 1.4.232 — fix(sovereign-tls): render Cilium Gateway listener YAML in
# the chart (templates/sovereign-tls-vars-cm.yaml) and feed it into the
# sovereign-tls Kustomization via Flux postBuild.substituteFrom on
# ConfigMap/sovereign-tls-vars. Removes the O(N)-per-parent-zone listener
# block from cloud-init (infra/hetzner/cloudinit-control-plane.tftpl)
# so 4-zone SME-pool Sovereigns (e.g. omantel.biz + .omani.{homes,rest,
# trade}) render under Hetzner's 32 KiB user_data cap with safety margin.
# Closes #2118 (TBD-V48). Synthetic render evidence: t39 4-zone cloud-init
# stripped 30,748 → 28,619 bytes (saves 2,129 bytes; threshold 32,256).
# Helm template covers all three historical paths preserved by the old
# locals.parent_domains_listeners_yaml in infra/hetzner/main.tf:
# - Single-zone fallback (parentZones empty → list with sovereign FQDN)
# emits bare `https`/`http` listener names (every catalyst-system
# HTTPRoute hardcodes sectionName: https on single-zone Sovereigns).
# - Multi-zone (SME pool) emits unique `https-<sanitised>` /
# `http-<sanitised>` names per parent zone.
# - TBD-A32 #1886 per-prov 2-label wildcard listener pair appended when
# sovereignFQDN ∉ parentZones (skipped otherwise to avoid Conflict).
# Cross-region: every region runs the same bp-catalyst-platform chart,
# each peer renders its own ConfigMap into its own flux-system, so
# sovereign-tls Kustomizations in secondary regions read locally.
# 1.4.183 — fix(httproute): omit default sectionName so multi-zone
# Sovereigns attach via Cilium Gateway hostname matcher (Closes #1884,
# TBD-A30). Pre-1.4.183 every catalyst-system HTTPRoute pinned

View File

@ -0,0 +1,154 @@
{{- /*
sovereign-tls-vars ConfigMap — Flux postBuild.substituteFrom source for
the sovereign-tls Kustomization (clusters/_template/sovereign-tls/).
Why this lives in the chart (Closes #2118)
─────────────────────────────────────────
The Cilium Gateway listeners block (cilium-gateway.yaml `spec.listeners:
${PARENT_DOMAINS_LISTENERS_YAML}`) was previously materialised in
infra/hetzner/main.tf (locals.parent_domains_listeners_yaml) and threaded
into cloud-init as an inline `postBuild.substitute` value on the
sovereign-tls Kustomization manifest.
That value scales O(N) with the parent_domains count: ~440 bytes per
parent zone (HTTPS+HTTP listener objects with cert refs + allowedRoutes)
plus the per-prov 2-label wildcard pair (TBD-A32 #1886) when the
sovereign FQDN is not itself one of the parent zones. For a 4-zone
SME-pool Sovereign (primary + 3 sme-pool, e.g. omantel.biz + omani.{homes,
rest,trade}) the value renders to ~2,210 bytes inside cloud-init — and
Hetzner caps user_data at HARD 32 KiB. Cloud-init for t39's exact body
overshot the post-#1985 guardrail (32,256) at 33,656 bytes (audit
agent-a2c1647c, 2026-05-20).
Fix: render the listener YAML inside the chart from .Values.parentZones
(single source of truth — already populated by bootstrap-kit slot 13's
`${PARENT_DOMAINS_YAML}` substitute). Emit it into a ConfigMap in
flux-system/. The sovereign-tls Kustomization adds
`postBuild.substituteFrom: [{kind: ConfigMap, name: sovereign-tls-vars,
namespace: flux-system}]` and reads the value from there instead of an
inline substitute key. The chart is INSIDE bootstrap-kit (slot 13);
bootstrap-kit reaches Ready iff every HR (including this chart) is
Ready; the sovereign-tls Kustomization `dependsOn: bootstrap-kit Ready`,
so by the time sovereign-tls reconciles, this ConfigMap exists in etcd.
Ordering is the same as the legacy cloud-init substitute path — Cilium
Gateway always lands AFTER the chart's per-zone resources are committed.
Catalyst-Zero (contabo, no global.sovereignFQDN, no sovereign-tls
Kustomization) skips this template via the guard below so the contabo
Kustomize build remains untouched.
Listener-shape contract (must match locals.parent_domains_listeners_yaml
in infra/hetzner/main.tf historically):
- SINGLE parent zone → listener names are the bare strings
`https` / `http` (every platform HTTPRoute
hardcodes `parentRefs[0].sectionName: https`
on single-zone Sovereigns).
- MULTIPLE parent zones (SME pool present) → unique names per zone:
`https-<sanitised>` / `http-<sanitised>`
where sanitised = zone.replace(".", "-").
- certificateRefs ALWAYS targets the per-prov per-name TLS Secret
`sovereign-wildcard-tls-<sovereignFQDN-dashed>` (TBD-A29 #1883 —
LE rate-limit bypass via per-prov identifier set).
- PER-PROV 2-label wildcard listener pair (TBD-A32 #1886) appended
when global.sovereignFQDN is NOT identical to any parent-zone name
(i.e. the operator did not bring the apex itself). Pair is hostnamed
`*.<sovereignFQDN>` so 2-label-deep operator endpoints
(`console.t39.omantel.biz`) match. Named `https-<fqdn-dashed>` /
`http-<fqdn-dashed>` so the parent-zone listener doesn't collide.
The output value is a JSON-flow array string (YAML-compatible) consumed
as a YAML scalar at `listeners: ${PARENT_DOMAINS_LISTENERS_YAML}` in
clusters/_template/sovereign-tls/cilium-gateway.yaml. Flux's
postBuild.substituteFrom inlines the value verbatim and the apiserver
parses it as the materialised listener list.
*/}}
{{- if .Values.global.sovereignFQDN }}
{{- $fqdn := .Values.global.sovereignFQDN }}
{{- $fqdnDashed := replace "." "-" $fqdn }}
{{- $secretName := printf "sovereign-wildcard-tls-%s" $fqdnDashed }}
{{- $zones := default (list) .Values.parentZones }}
{{- /* Single-zone fallback so legacy Sovereigns that ship parentZones
empty still produce a valid listener pair. Mirrors the same
fallback infra/hetzner/main.tf locals.parent_domains_decoded used
(single zone derived from sovereign FQDN, role=primary). */}}
{{- if eq (len $zones) 0 }}
{{- $zones = list (dict "name" $fqdn "role" "primary") }}
{{- end }}
{{- $single := eq (len $zones) 1 }}
{{- /* Build the listener array. We assemble a list of dicts then
toJson it; Go-template flow is verbose but unambiguous. */}}
{{- $listeners := list }}
{{- range $z := $zones }}
{{- $sanitised := replace "." "-" $z.name }}
{{- $httpsName := ternary "https" (printf "https-%s" $sanitised) $single }}
{{- $httpName := ternary "http" (printf "http-%s" $sanitised) $single }}
{{- $httpsListener := dict
"name" $httpsName
"port" 30443
"protocol" "HTTPS"
"hostname" (printf "*.%s" $z.name)
"tls" (dict
"mode" "Terminate"
"certificateRefs" (list (dict "kind" "Secret" "name" $secretName))
)
"allowedRoutes" (dict "namespaces" (dict "from" "All"))
}}
{{- $httpListener := dict
"name" $httpName
"port" 30080
"protocol" "HTTP"
"hostname" (printf "*.%s" $z.name)
"allowedRoutes" (dict "namespaces" (dict "from" "All"))
}}
{{- $listeners = append $listeners $httpsListener }}
{{- $listeners = append $listeners $httpListener }}
{{- end }}
{{- /* Per-prov 2-label wildcard pair (TBD-A32 #1886). Skipped when
sovereignFQDN is identical to a declared parent-zone name
(legacy single-zone-on-apex case — duplicate listener-name
guard). */}}
{{- $fqdnInZones := false }}
{{- range $z := $zones }}
{{- if eq $z.name $fqdn }}
{{- $fqdnInZones = true }}
{{- end }}
{{- end }}
{{- if not $fqdnInZones }}
{{- $listeners = append $listeners (dict
"name" (printf "https-%s" $fqdnDashed)
"port" 30443
"protocol" "HTTPS"
"hostname" (printf "*.%s" $fqdn)
"tls" (dict
"mode" "Terminate"
"certificateRefs" (list (dict "kind" "Secret" "name" $secretName))
)
"allowedRoutes" (dict "namespaces" (dict "from" "All"))
) }}
{{- $listeners = append $listeners (dict
"name" (printf "http-%s" $fqdnDashed)
"port" 30080
"protocol" "HTTP"
"hostname" (printf "*.%s" $fqdn)
"allowedRoutes" (dict "namespaces" (dict "from" "All"))
) }}
{{- end }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: sovereign-tls-vars
namespace: flux-system
labels:
app.kubernetes.io/managed-by: helm
app.kubernetes.io/component: sovereign-tls-vars
catalyst.openova.io/sovereign: {{ $fqdn | quote }}
data:
# PARENT_DOMAINS_LISTENERS_YAML — JSON-flow array literal of the
# Cilium Gateway listener block. Consumed by Flux postBuild.
# substituteFrom on the sovereign-tls Kustomization (cloud-init
# writes that Kustomization). See cilium-gateway.yaml `listeners:
# ${PARENT_DOMAINS_LISTENERS_YAML}` for the consumer.
PARENT_DOMAINS_LISTENERS_YAML: {{ toJson $listeners | quote }}
{{- end }}

View File

@ -761,7 +761,9 @@ ingress:
#
# The Cilium Gateway template
# (clusters/_template/sovereign-tls/cilium-gateway.yaml +
# infra/hetzner/main.tf locals.parent_domains_listeners_yaml)
# bp-catalyst-platform's templates/sovereign-tls-vars-cm.yaml
# — formerly infra/hetzner/main.tf locals.parent_domains_listeners_yaml
# before #2118 / TBD-V48 hoisted the render into the chart)
# names HTTPS listeners as follows:
# - SINGLE parent zone → bare `https` / `http`
# - MULTIPLE parent zones (SME pool present) → unique