Closes the 90-min chart-roll wedge observed on omantel.biz provision #6
(2026-05-10) where bp-catalyst-platform 1.4.128 sat in `pending-upgrade`
for ~90 min until manual reconcile + Bucket-A kubectl-apply unblocked it.
Root cause (chart-roll-rca-iter15.md): bp-catalyst-platform's dependsOn
omits bp-crossplane-claims, the chart that owns the
access.openova.io/v1alpha1 XRD. Slot 13 races slot 14, qa-fixtures
UserAccess CRs hit admission before the XRD is registered, Helm rejects
the manifests with `no matches for kind "UserAccess" in version
"access.openova.io/v1alpha1"`, the release Secret enters
pending-upgrade, and the install/upgrade blocks' 25m timeout x 3
remediation.retries ceiling allows the wedge to compound for ~75 min
worst case before any operator-visible failure.
PR-1 (CRITICAL) - bp-catalyst-platform dependsOn += bp-crossplane-claims:
- clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
- Adds the missing edge so Flux blocks the umbrella install until the
XRD is live. Eliminates the race entirely on a fresh roll.
PR-3 - install/upgrade remediation hardening:
- clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
- Adds `cleanupOnFail: true` to both install and upgrade blocks (purges
partial release artifacts on retry).
- Adds `remediation.strategy: rollback` and `remediateLastFailure: true`
(rollback to last good release before retrying, instead of pinning
the release Secret at `pending-upgrade` for the full timeout).
- Reduces `timeout: 25m` -> `15m` (with PR-1 fixed, 25m is overkill;
faster failure = faster automatic recovery).
- Net: failed-then-recoverable upgrade collapses from ~75 min worst case
to ~15 min worst case.
PR-2 (defense-in-depth) - APIVersions.Has gate on UserAccess templates:
- products/catalyst/chart/templates/qa-fixtures/useraccess-qa-user1.yaml
- products/catalyst/chart/templates/qa-fixtures/useraccess-qa-test-tiers.yaml
- Wraps the gating `if` with `(.Capabilities.APIVersions.Has
"access.openova.io/v1alpha1/UserAccess")`. If the XRD is not yet
registered the manifest evaluates to empty bytes, eliminating the
admission-rejection class of chart-roll wedges even if dependsOn
ordering breaks again.
Acceptance test (next fresh provision, e.g., provision #7):
- `kubectl get hr -n flux-system bp-catalyst-platform` reaches
Ready=True on the FIRST install action (no `pending-upgrade`).
- Chart roll completes in <15 min, zero-touch (no manual
`flux reconcile`, no Bucket-A kubectl-apply).
- `kubectl get useraccess -A` shows qa-user1 + 5 qa-test-{tier} CRs
without operator intervention.
Refs: chart-roll-rca-iter15.md (PR-1, PR-2, PR-3 sections).
Co-authored-by: e3mrah <alierenbaysal@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>