openova/platform/guacamole/chart/templates/recordings-pvc-migrate-hook.yaml
e3mrah dfd48b1626
fix(chart,api,controllers,ui): qa-loop iter-11 Fix #45 — three-cluster closeout (#1265)
Cluster-A (bp-guacamole PVC immutability):
  - New pre-install/pre-upgrade Helm hook (Job + per-release SA/Role/
    RoleBinding + cluster-scoped CR/CRB for PV cleanup) that detects
    when an existing `guacamole-recordings` PVC is bound to a
    storageClass different from `.Values.guacamole.recordings.storageClass`
    and deletes the PVC + bound PV so the chart-side PVC manifest can
    recreate cleanly. Closes the live bp-guacamole HelmRelease wedge on
    omantel iter-11 (`PersistentVolumeClaim ... is invalid: spec:
    Forbidden: spec is immutable after creation`).
  - Operator escape hatch: `.Values.guacamole.recordings.allowMigration:
    false` suppresses the hook for Sovereigns with long-lived recording
    state.
  - Render test extended (15 docs total, plus toggle assertion).
  - bp-guacamole chart 0.1.8 → 0.1.9; bootstrap-kit slot pin bumped
    in both _template and omantel.omani.works overlays.

Cluster-B (Application phase stuck on Provisioning):
  - application-controller now observes the per-region downstream
    HelmRelease.status.conditions[Ready] and rolls up
    Application.status.phase: any region Ready=True → phase=Ready,
    any Ready=False → phase=Degraded, no HR yet → phase=Provisioning.
  - Periodic 30s re-list ticker (Run goroutine) so HR readiness flips
    reach the Application even though the Application Watch doesn't
    fire on sibling HR changes.
  - status.lastReconciledAt populated on every reconcile pass for
    TC-113.
  - application-controller ClusterRole gains
    helm.toolkit.fluxcd.io/helmreleases get/list/watch.
  - 3 new unit tests (HR Ready=True → phase=Ready, HR Ready=False →
    phase=Degraded with verbatim message, no-HR → phase=Provisioning).

Cluster-C (SPA AppDetail + k8s services namespace filter):
  - GET /api/v1/sovereigns/{id}/applications/{name} returns full
    Application detail (identity + spec + status). The SPA AppDetail
    page now falls back to this endpoint when wizard store has no
    descriptor for the requested componentId — the typical chroot
    Sovereign case where Apps are installed via `kubectl apply` /
    catalyst-api install endpoint, NOT via the wizard. Without the
    fallback every chroot-installed Application surfaced "App not
    found / The component qa-wp is not part of this deployment"
    even though the underlying CR was Ready=True. Closes TC-068 /
    TC-072 / TC-074 / TC-076 / TC-077 / TC-079 et al.
  - GET /api/v1/sovereigns/{id}/k8s/{kind} accepts BOTH `?ns=`
    (historic) AND `?namespace=` (kubectl/SPA-canonical). Without
    the alias TC-262 / TC-263 returned every namespace's services
    instead of qa-omantel-only. New test covers all 4 query
    permutations.

Chart bumps:
  - bp-catalyst-platform 1.4.116 → 1.4.117 (+ pin in
    clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml).
  - bp-guacamole 0.1.8 → 0.1.9.

Refs: qa-loop iter-11 Fix #45 (Cluster-A + Cluster-B + Cluster-C);
post-merge image SHAs land via the catalyst-api / catalyst-controllers
build workflows + the bp-guacamole / bp-catalyst-platform release
workflows.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 07:26:05 +04:00

159 lines
7.7 KiB
YAML

{{- /*
PVC storageClass-migration hook (qa-loop iter-11 Fix #45 Cluster-A).
Background — the immutable-spec problem
=======================================
PersistentVolumeClaim.spec is immutable after creation EXCEPT for
`resources.requests.storage` (resize, when allowed by the StorageClass)
and `volumeAttributesClassName`. Specifically `storageClassName` is
strictly immutable. Once a PVC is bound to a PV under storageClass X,
no `helm upgrade` that changes `.Values.guacamole.recordings.storageClass`
to Y will ever succeed — the K8s apiserver rejects the patch with
`PersistentVolumeClaim ... is invalid: spec: Forbidden: spec is
immutable after creation except resources.requests.storage and
volumeAttributesClassName for bound claims`.
This is the live bp-guacamole HR failure we hit on omantel iter-11:
PR #1259 left `.Values.recordings.storageClass` at upstream default
`hcloud-volumes`, the omantel cluster overlay set it to
`seaweedfs-storage`, but the pre-existing PVC was bound to `local-path`
(from a prior reconcile pass), and the upgrade locked into a permanent
remediation loop.
Why a hook (not a migration Job, not chart-rename)
==================================================
A regular Job would run AFTER the templates render — too late, because
the helm-upgrade fails before the Job ever lands. A chart-side rename
of the PVC pattern (e.g. include a hash of the storage class) would
churn through PVs every time the value changes, losing data unless we
also added a backup-restore lifecycle. Per docs/INVIOLABLE-PRINCIPLES.md
(no "for now" workarounds, no compromised quality), the right primitive
is the Helm `pre-upgrade` hook — it runs BEFORE the chart re-renders
the PVC manifest, so it can delete the offending PVC + PV + finalizer
and let the post-render PVC create cleanly.
Recording-data lifecycle
========================
`/recordings` holds Guacamole session capture files (RDP/VNC/SSH/exec
playback). On a Sovereign without long-running sessions or before the
recording-shipper is wired up, deleting the volume is data-safe. The
hook is gated by `.Values.guacamole.recordings.allowMigration` so an
operator with live recording state can disable the destructive path
(default ON because the cost of leaving the upgrade wedged is much
higher than the cost of regenerating an empty recordings directory —
Guacamole creates per-connection subdirectories on demand).
When the PVC's existing storageClass already matches the chart-desired
one, the hook is a no-op. The check uses kubectl-as-the-subject's RBAC,
which the hook ServiceAccount provides via a per-release Role.
Pairs with:
- templates/seaweedfs-pvc.yaml — the actual PVC the chart wants
- templates/recordings-pvc-rbac.yaml — ServiceAccount + Role + Binding
*/}}
{{- $migrate := true -}}
{{- if hasKey .Values.guacamole.recordings "allowMigration" -}}
{{- $migrate = .Values.guacamole.recordings.allowMigration -}}
{{- end -}}
{{- if and .Values.guacamole.enabled $migrate -}}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "bp-guacamole.recordingsName" . }}-storageclass-migrate
namespace: {{ .Release.Namespace }}
labels:
{{- include "bp-guacamole.labels" . | nindent 4 }}
catalyst.openova.io/component: recordings-migrate
annotations:
# Run BEFORE templates land — pre-install + pre-upgrade so a fresh
# install (no PVC yet) is also a no-op safely (the kubectl-get in
# the script is forgiving). before-hook-creation makes Helm delete
# the prior Job manifest before re-applying so we're not blocked by
# the immutable Job.spec.template.
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 300
template:
metadata:
labels:
{{- include "bp-guacamole.labels" . | nindent 8 }}
catalyst.openova.io/component: recordings-migrate
spec:
serviceAccountName: {{ include "bp-guacamole.recordingsName" . }}-migrator
restartPolicy: Never
{{- with .Values.guacamole.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: migrate
# bitnami/kubectl is the canonical chart-side migration tool
# across Catalyst Blueprints (cf. bp-keycloak realm-config
# post-deploy Job pattern). SHA-pinned per
# docs/INVIOLABLE-PRINCIPLES.md #4a.
image: {{ .Values.guacamole.recordings.migrationImage | default "bitnami/kubectl:1.30.4" | quote }}
imagePullPolicy: IfNotPresent
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
seccompProfile:
type: RuntimeDefault
env:
- name: PVC_NAME
value: {{ include "bp-guacamole.recordingsName" . | quote }}
- name: PVC_NAMESPACE
value: {{ .Release.Namespace | quote }}
- name: DESIRED_STORAGECLASS
value: {{ .Values.guacamole.recordings.storageClass | quote }}
command: ["/bin/bash", "-c"]
args:
- |
set -euo pipefail
# Read existing PVC's storageClass; -o jsonpath emits empty
# string if PVC doesn't exist (kubectl returns 0 with empty
# output for that case via --ignore-not-found).
EXISTING_SC="$(kubectl get pvc "${PVC_NAME}" -n "${PVC_NAMESPACE}" \
--ignore-not-found \
-o jsonpath='{.spec.storageClassName}' 2>/dev/null || true)"
if [ -z "${EXISTING_SC}" ]; then
echo "PVC ${PVC_NAMESPACE}/${PVC_NAME} does not exist — fresh install, no migration needed."
exit 0
fi
if [ "${EXISTING_SC}" = "${DESIRED_STORAGECLASS}" ]; then
echo "PVC ${PVC_NAMESPACE}/${PVC_NAME} already on storageClass=${DESIRED_STORAGECLASS} — no migration."
exit 0
fi
echo "PVC storageClass mismatch — existing=${EXISTING_SC} desired=${DESIRED_STORAGECLASS}; deleting PVC + PV to allow recreation."
# Capture bound PV name before deleting PVC (Delete reclaim
# policy on most CSI drivers will auto-delete the PV when
# the PVC goes; Retain policies need explicit cleanup).
BOUND_PV="$(kubectl get pvc "${PVC_NAME}" -n "${PVC_NAMESPACE}" \
-o jsonpath='{.spec.volumeName}' 2>/dev/null || true)"
# Strip finalizers so the PVC actually deletes (kubernetes.io/pvc-protection
# blocks delete while a Pod still references it; the chart's
# webapp Deployment is being upgraded so the Pod is in the
# process of going away — we force the issue).
kubectl patch pvc "${PVC_NAME}" -n "${PVC_NAMESPACE}" \
--type=merge -p '{"metadata":{"finalizers":[]}}' \
--ignore-not-found || true
kubectl delete pvc "${PVC_NAME}" -n "${PVC_NAMESPACE}" \
--ignore-not-found --wait=true --timeout=60s
if [ -n "${BOUND_PV}" ]; then
echo "Cleaning up PV ${BOUND_PV} (storageClass=${EXISTING_SC})."
kubectl patch pv "${BOUND_PV}" \
--type=merge -p '{"metadata":{"finalizers":[]}}' \
--ignore-not-found || true
kubectl delete pv "${BOUND_PV}" \
--ignore-not-found --wait=true --timeout=60s
fi
echo "Migration complete; chart-side PVC will be recreated on this upgrade pass with storageClass=${DESIRED_STORAGECLASS}."
{{- end }}