fix(bp-keycloak): bump keycloak-config-cli hook timeouts (#129)
Fresh-Sovereign provision #15 (otech 0ad3687ddd72deb7) wedged at phase1-watching for 30+ min: bp-keycloak HelmRelease failed with `post-upgrade hooks failed: timed out waiting for the condition` → bp-gitea (dependsOn keycloak OIDC) blocked → bp-self-sovereign-cutover never converged. Root cause ────────── The bitnami keycloak subchart's `keycloak-config-cli-job.yaml` is rendered as a Helm post-install/post-upgrade/post-rollback hook (default annotations on the Job, weight 5). On a fresh k3s the realm-import Job fires before Postgres+Liquibase finish bootstrapping Keycloak (legitimately 3-10 min), and the bitnami subchart defaults are too tight to absorb that race: - keycloakConfigCli.availabilityCheck.timeout="" → keycloak-config-cli falls back to its internal ~120s wait for Keycloak's /admin endpoint - keycloakConfigCli.backoffLimit: 1 → only 2 Pod attempts total before the Job is marked Failed Both attempts hit the 120s window, Job goes Failed, Helm reports the post-upgrade hook timed out, HR install/upgrade retries (×3) all hit the same race, HR remains Failed → downstream blueprints never install. Fix ─── Tune the hook's internal timing to fit comfortably inside the parent HR's 15m install/upgrade timeout while leaving headroom for cold image pull + Pod scheduling: keycloak.keycloakConfigCli.availabilityCheck.timeout: "600s" (was "") keycloak.keycloakConfigCli.backoffLimit: 5 (was 1) Both knobs remain operator-overridable via per-Sovereign `valuesFrom` (Inviolable Principle #4: no hardcoding). Per Inviolable Principle #3 (no workarounds), this does NOT disable the hook semantics — disabling the hook would break the documented contract that the realm exists before the HR reaches Ready (downstream bp-gitea + catalyst-api consume the realm). Files ───── platform/keycloak/chart/values.yaml (+59 inline rationale) platform/keycloak/chart/Chart.yaml (1.4.2 → 1.4.3 + changelog) clusters/_template/bootstrap-kit/09-keycloak.yaml (HR pin → 1.4.3) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
58f518ff3d
commit
9a5cbcd178
@ -45,7 +45,12 @@ spec:
|
||||
# parameter so each Sovereign owns its KC realm named after the tenant
|
||||
# short-name (omantel chroot → "omantel"). Default `sovereign` is kept
|
||||
# in the chart for backward compat with overlays not yet migrated.
|
||||
version: 1.4.2
|
||||
# 1.4.3 (issue #129): bumps keycloakConfigCli.availabilityCheck.timeout
|
||||
# 120s → 600s + backoffLimit 1 → 5. Fixes "post-upgrade hooks failed:
|
||||
# timed out waiting for the condition" wedge on fresh provisions where
|
||||
# Postgres+Liquibase bootstrap exceeds the bitnami subchart's 120s
|
||||
# default Keycloak-availability window for the realm-import Job.
|
||||
version: 1.4.3
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-keycloak
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
apiVersion: v2
|
||||
name: bp-keycloak
|
||||
version: 1.4.2
|
||||
version: 1.4.3
|
||||
description: |
|
||||
Catalyst-curated Blueprint umbrella chart for Keycloak. Depends on the
|
||||
upstream `keycloak` chart (bitnami) as a Helm subchart so
|
||||
@ -36,6 +36,19 @@ description: |
|
||||
credentials Secret `realm` key flow from the same value, keeping
|
||||
Keycloak realm and catalyst-api CATALYST_KC_REALM env in sync. Closes
|
||||
TC-124, TC-125, TC-159, TC-160, TC-161, TC-176, TC-190, TC-285.
|
||||
1.4.3 (issue #129, post-upgrade hook timeout wedge): bump bitnami
|
||||
keycloakConfigCli.availabilityCheck.timeout 120s → 600s and
|
||||
backoffLimit 1 → 5. Fresh-Sovereign provision #15 (otech
|
||||
0ad3687ddd72deb7) wedged at phase1-watching because the
|
||||
keycloak-config-cli Job (rendered as a Helm post-install/post-upgrade
|
||||
hook by the bitnami subchart's default annotations) timed out twice
|
||||
inside its 120s availability window before Postgres+Liquibase finished
|
||||
bootstrapping Keycloak. Helm reported "post-upgrade hooks failed:
|
||||
timed out waiting for the condition" → bp-keycloak HR never reached
|
||||
Ready → bp-gitea (dependsOn keycloak OIDC) blocked → bp-self-sovereign-
|
||||
cutover never converged. New defaults sit comfortably inside the parent
|
||||
HR's 15m install/upgrade timeout. Both knobs remain per-Sovereign
|
||||
overridable via valuesFrom (Inviolable Principle #4: no hardcoding).
|
||||
type: application
|
||||
keywords: [catalyst, blueprint, keycloak]
|
||||
maintainers:
|
||||
|
||||
@ -276,13 +276,63 @@ keycloak:
|
||||
# k3s --oidc-groups-prefix flag)
|
||||
keycloakConfigCli:
|
||||
enabled: true
|
||||
# Run the realm-import Job as a Helm post-install hook so the realm
|
||||
# is provisioned exactly once per fresh release. Re-run on upgrade
|
||||
# is idempotent (config-cli reconciles the realm to spec).
|
||||
# Run the realm-import Job as a Helm post-install/post-upgrade/post-rollback
|
||||
# hook (bitnami subchart default — annotations defined on the Job template
|
||||
# via .Values.keycloakConfigCli.annotations: helm.sh/hook = post-install,
|
||||
# post-upgrade,post-rollback with weight 5). The realm is provisioned
|
||||
# exactly once per fresh release; re-run on upgrade is idempotent
|
||||
# (config-cli reconciles the realm to spec).
|
||||
image:
|
||||
registry: docker.io
|
||||
repository: bitnamilegacy/keycloak-config-cli
|
||||
tag: 6.4.0-debian-12-r11
|
||||
# ─── Hook timing — issue #129 (post-upgrade hook timeout wedge) ─────────
|
||||
#
|
||||
# Root cause history (prov #15, otech 0ad3687ddd72deb7, 2026-05-10):
|
||||
# Flux's first install of bp-keycloak races against PostgreSQL readiness;
|
||||
# when keycloak-config-cli's first run-attempt fires before Keycloak's
|
||||
# admin endpoint is reachable, it counts toward Job retries. Bitnami
|
||||
# subchart defaults are too tight for a fresh Sovereign:
|
||||
# - availabilityCheck.timeout="" → keycloak-config-cli falls back to
|
||||
# its own internal default (~120s) waiting for Keycloak
|
||||
# - backoffLimit: 1 → only 2 Pod attempts total before Job marked Failed
|
||||
# On a brand-new k3s where Postgres+Liquibase migrations + Keycloak
|
||||
# bootstrap legitimately take 3-10+ minutes, both attempts time out
|
||||
# inside the 120s window → Job Failed → Helm reports
|
||||
# "post-upgrade hooks failed: timed out waiting for the condition"
|
||||
# → bp-keycloak HelmRelease never reaches Ready=True → bp-gitea
|
||||
# (dependsOn keycloak OIDC) never installs → bp-self-sovereign-cutover
|
||||
# never converges → fresh provision wedges at phase1-watching.
|
||||
#
|
||||
# Fix per Inviolable Principle #4 (no hardcoding) — both knobs are
|
||||
# operator-overridable via per-Sovereign overlay valuesFrom path:
|
||||
# keycloak.keycloakConfigCli.availabilityCheck.timeout
|
||||
# keycloak.keycloakConfigCli.backoffLimit
|
||||
# The chart defaults below are tuned for the documented worst case
|
||||
# (slow Postgres + cold image-pull); operators with faster substrates
|
||||
# may shorten them.
|
||||
#
|
||||
# Numbers picked to fit inside the parent HR's 15m install/upgrade
|
||||
# timeout while leaving headroom for Job pod scheduling + image pull:
|
||||
# - availabilityCheck.timeout 600s (10 min) — keycloak-config-cli
|
||||
# polls Keycloak's /admin REST endpoint; this caps the single-Pod
|
||||
# wait. Covers Postgres bootstrap + Liquibase + Keycloak internal
|
||||
# startup with margin.
|
||||
# - backoffLimit 5 — Job retries with exponential backoff (cap 6m by
|
||||
# default). Combined with a 10m availability poll the realistic
|
||||
# worst case is one Pod hitting the timeout, then a successful
|
||||
# retry once Keycloak Ready.
|
||||
#
|
||||
# NOT a workaround per Inviolable Principle #3. The workaround would
|
||||
# be disabling the hook semantics entirely (set annotations: {}); that
|
||||
# breaks the documented contract that the realm is imported before
|
||||
# the Helm release reaches Ready (downstream bp-gitea + catalyst-api
|
||||
# depend on the realm existing). Tuning the hook's internal timing
|
||||
# respects the contract.
|
||||
availabilityCheck:
|
||||
enabled: true
|
||||
timeout: "600s"
|
||||
backoffLimit: 5
|
||||
# ─── Phase-8b realm ConfigMap (issue #604, #899) ─────────────────────────
|
||||
# The realm import JSON is now owned by the parent bp-keycloak chart's
|
||||
# templates/configmap-sovereign-realm.yaml template. That template has
|
||||
|
||||
Loading…
Reference in New Issue
Block a user