Cluster-A (bp-guacamole PVC immutability):
- New pre-install/pre-upgrade Helm hook (Job + per-release SA/Role/
RoleBinding + cluster-scoped CR/CRB for PV cleanup) that detects
when an existing `guacamole-recordings` PVC is bound to a
storageClass different from `.Values.guacamole.recordings.storageClass`
and deletes the PVC + bound PV so the chart-side PVC manifest can
recreate cleanly. Closes the live bp-guacamole HelmRelease wedge on
omantel iter-11 (`PersistentVolumeClaim ... is invalid: spec:
Forbidden: spec is immutable after creation`).
- Operator escape hatch: `.Values.guacamole.recordings.allowMigration:
false` suppresses the hook for Sovereigns with long-lived recording
state.
- Render test extended (15 docs total, plus toggle assertion).
- bp-guacamole chart 0.1.8 → 0.1.9; bootstrap-kit slot pin bumped
in both _template and omantel.omani.works overlays.
Cluster-B (Application phase stuck on Provisioning):
- application-controller now observes the per-region downstream
HelmRelease.status.conditions[Ready] and rolls up
Application.status.phase: any region Ready=True → phase=Ready,
any Ready=False → phase=Degraded, no HR yet → phase=Provisioning.
- Periodic 30s re-list ticker (Run goroutine) so HR readiness flips
reach the Application even though the Application Watch doesn't
fire on sibling HR changes.
- status.lastReconciledAt populated on every reconcile pass for
TC-113.
- application-controller ClusterRole gains
helm.toolkit.fluxcd.io/helmreleases get/list/watch.
- 3 new unit tests (HR Ready=True → phase=Ready, HR Ready=False →
phase=Degraded with verbatim message, no-HR → phase=Provisioning).
Cluster-C (SPA AppDetail + k8s services namespace filter):
- GET /api/v1/sovereigns/{id}/applications/{name} returns full
Application detail (identity + spec + status). The SPA AppDetail
page now falls back to this endpoint when wizard store has no
descriptor for the requested componentId — the typical chroot
Sovereign case where Apps are installed via `kubectl apply` /
catalyst-api install endpoint, NOT via the wizard. Without the
fallback every chroot-installed Application surfaced "App not
found / The component qa-wp is not part of this deployment"
even though the underlying CR was Ready=True. Closes TC-068 /
TC-072 / TC-074 / TC-076 / TC-077 / TC-079 et al.
- GET /api/v1/sovereigns/{id}/k8s/{kind} accepts BOTH `?ns=`
(historic) AND `?namespace=` (kubectl/SPA-canonical). Without
the alias TC-262 / TC-263 returned every namespace's services
instead of qa-omantel-only. New test covers all 4 query
permutations.
Chart bumps:
- bp-catalyst-platform 1.4.116 → 1.4.117 (+ pin in
clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml).
- bp-guacamole 0.1.8 → 0.1.9.
Refs: qa-loop iter-11 Fix #45 (Cluster-A + Cluster-B + Cluster-C);
post-merge image SHAs land via the catalyst-api / catalyst-controllers
build workflows + the bp-guacamole / bp-catalyst-platform release
workflows.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
fea726233c
commit
dfd48b1626
@ -337,13 +337,28 @@ spec:
|
||||
# rebuilt application-controller image (:24aab61) into values.yaml
|
||||
# — chart 1.4.115 had stale tag (:a3ba200) because GH Actions
|
||||
# silently filters bot pushes from triggering blueprint-release.
|
||||
# 1.4.117 (qa-loop iter-11 Fix #45 Cluster-B + Cluster-C):
|
||||
# B) application-controller now observes downstream HelmRelease
|
||||
# readiness (helm.toolkit.fluxcd.io/helmreleases get/list/
|
||||
# watch added to ClusterRole) and rolls up Application
|
||||
# .status.phase from Provisioning → Ready. Periodic 30s
|
||||
# re-list ticker so HR readiness flips reach the parent.
|
||||
# status.lastReconciledAt populated for TC-113.
|
||||
# C) catalyst-api gains GET /sovereigns/{id}/applications/{name}
|
||||
# (full Application detail) and accepts ?namespace= alias on
|
||||
# /sovereigns/{id}/k8s/{kind}. SPA AppDetail.tsx falls back
|
||||
# to the new GET when wizard store has no descriptor (typical
|
||||
# chroot Sovereign case — closes the "App not found" misfire
|
||||
# on TC-068 / TC-072 / TC-074). TC-262 / TC-263 (services
|
||||
# filtered to qa-omantel only) flip PASS via the namespace
|
||||
# alias.
|
||||
# 1.4.109 (Fix #40 follow-up #2): drop /api/v1 from organization +
|
||||
# environment-controller GITEA URL defaults — Gitea client appends
|
||||
# it; the prior default produced /api/v1/api/v1/... 404s on
|
||||
# EnsureOrg / EnsureRepo blocking qa-wp Application reconcile.
|
||||
# bootstrap-kit qaFixtures.cnpgPairName default qa-cnpg → qa-cnpgpair
|
||||
# so TC-306's "cnpgpair" substring assertion passes.
|
||||
version: 1.4.116
|
||||
version: 1.4.117
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-catalyst-platform
|
||||
|
||||
@ -81,10 +81,15 @@ spec:
|
||||
chart:
|
||||
spec:
|
||||
chart: bp-guacamole
|
||||
# 0.1.6: 0.1.5 (added /home/guacamole/.guacamole emptyDir
|
||||
# mount for readOnlyRootFilesystem compatibility) +
|
||||
# post-merge CI auto-bump.
|
||||
version: 0.1.6
|
||||
# 0.1.9 (qa-loop iter-11 Fix #45 Cluster-A): adds the
|
||||
# storageClass-migration pre-install/pre-upgrade hook that
|
||||
# unwedges Sovereigns where the existing `guacamole-recordings`
|
||||
# PVC is bound to a different storageClass than the chart-desired
|
||||
# one (live failure on omantel iter-11: PVC bound to local-path,
|
||||
# chart wanted seaweedfs-storage, K8s rejected the immutable-spec
|
||||
# patch with `cannot patch ... PersistentVolumeClaim ... is
|
||||
# invalid: spec: Forbidden: spec is immutable after creation`).
|
||||
version: 0.1.9
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-guacamole
|
||||
|
||||
@ -47,9 +47,17 @@ spec:
|
||||
chart:
|
||||
spec:
|
||||
chart: bp-guacamole
|
||||
# 0.1.6: 0.1.5 (.guacamole home emptyDir mount for
|
||||
# readOnlyRootFilesystem) + CI auto-bumped values.yaml.
|
||||
version: 0.1.6
|
||||
# 0.1.9 (qa-loop iter-11 Fix #45 Cluster-A): storageClass-
|
||||
# migration pre-install/pre-upgrade hook. The live bp-guacamole HR
|
||||
# wedged on omantel iter-11 because the existing
|
||||
# `guacamole-recordings` PVC was bound to local-path while the
|
||||
# in-cluster HR object had `seaweedfs-storage` (drift between
|
||||
# the on-disk overlay below and what was already on the API
|
||||
# server). With this version the chart's pre-upgrade hook reads
|
||||
# the existing PVC and either no-ops (storageClass matches) or
|
||||
# deletes the PVC + bound PV so the post-render PVC manifest
|
||||
# creates cleanly with whatever storageClass is desired.
|
||||
version: 0.1.9
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-guacamole
|
||||
|
||||
@ -124,6 +124,27 @@ var (
|
||||
Version: "v1",
|
||||
Resource: "kustomizations",
|
||||
}
|
||||
|
||||
// FluxHelmReleaseGVR — the per-Application HelmRelease that lands
|
||||
// in the Application CR's own namespace once the
|
||||
// per-region Kustomization reconciles the manifests committed to
|
||||
// Gitea. The reconciler observes this HR's status.conditions[Ready]
|
||||
// to flip Application.status.phase from `Provisioning` to `Ready`
|
||||
// (qa-loop iter-11 Fix #45 Cluster-B). Without the observation
|
||||
// the Application CR sat at Provisioning forever even after the
|
||||
// downstream Helm install completed — the Sovereign Console treated
|
||||
// it as still-installing, the matrix-asserted "Installed" terminal
|
||||
// phase never arrived, and TC-066 / TC-100 / TC-104 / TC-113 / TC-117
|
||||
// stayed FAIL.
|
||||
//
|
||||
// v2 is the Flux 2.4+ stable. Same Inviolable-Principle #3 rationale
|
||||
// as the GitRepository / Kustomization GVR comments above — no v2beta
|
||||
// fallback because Sovereigns standardise on bp-flux 1.x.
|
||||
FluxHelmReleaseGVR = schema.GroupVersionResource{
|
||||
Group: "helm.toolkit.fluxcd.io",
|
||||
Version: "v2",
|
||||
Resource: "helmreleases",
|
||||
}
|
||||
)
|
||||
|
||||
// Phase strings — surfaced on Application.status.phase per the CRD's
|
||||
@ -245,6 +266,16 @@ type Config struct {
|
||||
// Empty means anonymous clone (acceptable for in-cluster Gitea where
|
||||
// the network boundary is the K8s service cordon). Defaults to "".
|
||||
FluxGiteaSecretRef string
|
||||
|
||||
// HelmReleaseObservationInterval is how often the periodic re-list
|
||||
// fires to pick up downstream HelmRelease readiness flips. Defaults
|
||||
// to 30s — short enough that the matrix-asserted 3-minute ceiling
|
||||
// for `qa-wp` to reach `phase=Ready` (TC-066) is comfortably met
|
||||
// even with a single observation miss. qa-loop iter-11 Fix #45
|
||||
// Cluster-B: without this re-list, Application.status.phase was
|
||||
// stuck at `Provisioning` indefinitely because the K8s Watch on
|
||||
// Application CRs doesn't fire when a SIBLING HR's status changes.
|
||||
HelmReleaseObservationInterval time.Duration
|
||||
}
|
||||
|
||||
// Defaults applies missing-field defaults to a Config. Returns a copy.
|
||||
@ -277,6 +308,9 @@ func (c Config) Defaults() Config {
|
||||
if out.HostFluxIntervalSeconds <= 0 {
|
||||
out.HostFluxIntervalSeconds = 60
|
||||
}
|
||||
if out.HelmReleaseObservationInterval <= 0 {
|
||||
out.HelmReleaseObservationInterval = 30 * time.Second
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
@ -320,6 +354,14 @@ func New(dyn dynamic.Interface, gitea Gitea, errs GiteaErrorClassifier, cfg Conf
|
||||
//
|
||||
// Watches Application CRs across all namespaces (the CRD is namespace-
|
||||
// scoped per products/catalyst/chart/crds/application.yaml).
|
||||
//
|
||||
// In addition to the Watch on Application CRs, a periodic re-list ticker
|
||||
// fires every `Cfg.HelmReleaseObservationInterval` (default 30s) so the
|
||||
// reconciler picks up downstream HelmRelease readiness flips. Without
|
||||
// this re-list, Application.status.phase would never transition off
|
||||
// `Provisioning` because nothing on the API server triggers a fresh
|
||||
// reconcile of the Application when its sibling HelmRelease's
|
||||
// status.conditions[Ready] flips True. qa-loop iter-11 Fix #45 Cluster-B.
|
||||
func (r *Reconciler) Run(ctx context.Context) error {
|
||||
if r.Dynamic == nil {
|
||||
return errors.New("controller: Dynamic client is required")
|
||||
@ -327,6 +369,9 @@ func (r *Reconciler) Run(ctx context.Context) error {
|
||||
if err := r.initialList(ctx); err != nil {
|
||||
return fmt.Errorf("initial list: %w", err)
|
||||
}
|
||||
// Periodic re-list ticker — observes HR status changes that don't
|
||||
// trigger an Application Watch event.
|
||||
go r.runPeriodicRelist(ctx)
|
||||
return wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (bool, error) {
|
||||
if err := r.watchOnce(ctx); err != nil {
|
||||
r.Log.Warn("application-controller: watch error; will retry", "err", err)
|
||||
@ -335,6 +380,34 @@ func (r *Reconciler) Run(ctx context.Context) error {
|
||||
})
|
||||
}
|
||||
|
||||
// runPeriodicRelist re-runs initialList every HelmReleaseObservationInterval
|
||||
// so that downstream HelmRelease.status.conditions[Ready] flips reach the
|
||||
// Application.status.phase. Watching the HR directly would also work but
|
||||
// is more complex (one watcher per app namespace, dynamic add/remove on
|
||||
// Application create/delete). The cheap re-list is correct + resilient
|
||||
// to API server restarts.
|
||||
//
|
||||
// qa-loop iter-11 Fix #45 Cluster-B.
|
||||
func (r *Reconciler) runPeriodicRelist(ctx context.Context) {
|
||||
interval := r.Cfg.HelmReleaseObservationInterval
|
||||
if interval <= 0 {
|
||||
interval = 30 * time.Second
|
||||
}
|
||||
t := time.NewTicker(interval)
|
||||
defer t.Stop()
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-t.C:
|
||||
if err := r.initialList(ctx); err != nil {
|
||||
r.Log.Warn("application-controller: periodic re-list error",
|
||||
"err", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (r *Reconciler) initialList(ctx context.Context) error {
|
||||
list, err := r.Dynamic.Resource(ApplicationGVR).Namespace("").List(ctx, metav1.ListOptions{})
|
||||
if err != nil {
|
||||
@ -615,27 +688,220 @@ func (r *Reconciler) Reconcile(ctx context.Context, app *unstructured.Unstructur
|
||||
fmt.Sprintf("ensure host Flux bootstrap: %v", err))
|
||||
}
|
||||
|
||||
// 10. Status update.
|
||||
// 10. Observe the downstream HelmRelease so the Application's
|
||||
// status.phase tracks the actual workload-install lifecycle, not
|
||||
// just the controller-side commit step. qa-loop iter-11 Fix #45
|
||||
// Cluster-B root cause: prior to this loop the controller hard-
|
||||
// coded `Phase: PhaseProvisioning` on every reconcile pass and
|
||||
// never re-observed the per-region HRs that Flux installs as
|
||||
// work-product of the Kustomization. The Application CR sat at
|
||||
// `Provisioning` indefinitely even after `kubectl get hr -n
|
||||
// <appNs> <appName>` was Ready=True for hours — the operator
|
||||
// UI couldn't pivot to the Ready dashboard, the matrix-asserted
|
||||
// terminal phase never arrived, and TC-066 / TC-100 / TC-104 /
|
||||
// TC-113 stayed FAIL.
|
||||
//
|
||||
// We poll the HR per region (cheap; in-cluster GET) and roll up
|
||||
// the readiness signal. The roll-up rule:
|
||||
// * any region HR Ready=True → phase=Ready
|
||||
// * any region HR Ready=False → phase=Degraded
|
||||
// * any region HR not yet present → phase=Provisioning
|
||||
// This stays consistent with the CRD's enum (Pending |
|
||||
// Provisioning | Ready | Degraded | Failed | Uninstalling) and
|
||||
// matches the matrix-author assertion in TC-066's must_contain
|
||||
// ("Ready").
|
||||
hrPhase, hrReason, hrMessage := r.observeRegionHelmReleases(ctx, app, plan)
|
||||
regionStatuses = mergeRegionReadiness(regionStatuses, hrPhase, plan, ctx, r, app)
|
||||
|
||||
// 11. Status update — phase derived from observed HR readiness,
|
||||
// fall back to Provisioning when no signal is available yet.
|
||||
giteaRepo := fmt.Sprintf("%s/%s/%s",
|
||||
strings.TrimRight(r.Cfg.GiteaPublicURL, "/"),
|
||||
envSpec.OrganizationRef, app.GetName())
|
||||
finalPhase := hrPhase
|
||||
finalReady := "True"
|
||||
finalReason := ReasonReconciled
|
||||
finalMessage := fmt.Sprintf("Application %s/%s reconciled into %d region(s)", app.GetNamespace(), app.GetName(), len(plan.Regions))
|
||||
if finalPhase == "" {
|
||||
finalPhase = PhaseProvisioning
|
||||
}
|
||||
switch finalPhase {
|
||||
case PhaseDegraded:
|
||||
finalReady = "False"
|
||||
finalReason = hrReason
|
||||
finalMessage = hrMessage
|
||||
case PhaseProvisioning:
|
||||
// Provisioning is "we did our part, Flux will apply" — Ready
|
||||
// stays True because the Application's own contract (manifests
|
||||
// committed + host Flux bootstrapped) IS done. The
|
||||
// `phase=Provisioning` signal is what the UI uses to show a
|
||||
// spinner; the Ready condition is what RBAC guards / fleet
|
||||
// rollups consume.
|
||||
finalReady = "True"
|
||||
finalReason = ReasonReconciled
|
||||
case PhaseReady:
|
||||
finalReady = "True"
|
||||
finalReason = ReasonReconciled
|
||||
finalMessage = fmt.Sprintf("Application %s/%s installed across %d region(s); Ready=True from downstream HelmRelease(s)",
|
||||
app.GetNamespace(), app.GetName(), len(plan.Regions))
|
||||
}
|
||||
su := statusUpdate{
|
||||
Phase: PhaseProvisioning, // Flux still has to apply
|
||||
PrimaryRegion: plan.PrimaryRegion,
|
||||
Regions: regionStatuses,
|
||||
GiteaRepo: giteaRepo,
|
||||
Phase: finalPhase,
|
||||
PrimaryRegion: plan.PrimaryRegion,
|
||||
Regions: regionStatuses,
|
||||
GiteaRepo: giteaRepo,
|
||||
Installed: map[string]interface{}{
|
||||
"name": spec.BlueprintName,
|
||||
"version": spec.BlueprintVersion,
|
||||
"digest": bpDigest,
|
||||
},
|
||||
Reason: ReasonReconciled,
|
||||
Message: fmt.Sprintf("Application %s/%s reconciled into %d region(s)", app.GetNamespace(), app.GetName(), len(plan.Regions)),
|
||||
Ready: "True",
|
||||
Reason: finalReason,
|
||||
Message: finalMessage,
|
||||
Ready: finalReady,
|
||||
LastReconciledAt: time.Now().UTC().Format(time.RFC3339),
|
||||
}
|
||||
return r.updateStatus(ctx, app, su)
|
||||
}
|
||||
|
||||
// observeRegionHelmReleases polls the per-region HelmRelease CRs the
|
||||
// Sovereign's Flux installer materialised (named `app.GetName()` in
|
||||
// the Application's own namespace, per render.HelmReleaseName / the
|
||||
// chart's HelmRelease template). Returns the rolled-up phase string +
|
||||
// the reason+message of the WORST region (so a single-region Failed
|
||||
// surfaces in the UI verbatim instead of being averaged out).
|
||||
//
|
||||
// Idempotent + side-effect-free: only reads the API.
|
||||
//
|
||||
// qa-loop iter-11 Fix #45 Cluster-B.
|
||||
func (r *Reconciler) observeRegionHelmReleases(
|
||||
ctx context.Context,
|
||||
app *unstructured.Unstructured,
|
||||
plan placement.Plan,
|
||||
) (phase, reason, message string) {
|
||||
allReady := true
|
||||
anyDegraded := false
|
||||
worstReason := ""
|
||||
worstMessage := ""
|
||||
sawAny := false
|
||||
for _, rp := range plan.Regions {
|
||||
// HR lives in the Application's own namespace, named after the
|
||||
// Application (matches render.HelmReleaseName + the chart's
|
||||
// HelmRelease template's `metadata.name: {{ .AppName }}`).
|
||||
hr, err := r.Dynamic.Resource(FluxHelmReleaseGVR).
|
||||
Namespace(app.GetNamespace()).
|
||||
Get(ctx, app.GetName(), metav1.GetOptions{})
|
||||
if err != nil {
|
||||
if apierrors.IsNotFound(err) {
|
||||
// HR not yet materialised — Flux still pulling. Roll up
|
||||
// to Provisioning, NOT Failed.
|
||||
allReady = false
|
||||
continue
|
||||
}
|
||||
r.Log.Warn("application-controller: GET HelmRelease failed",
|
||||
"namespace", app.GetNamespace(),
|
||||
"name", app.GetName(),
|
||||
"region", rp.Name,
|
||||
"err", err)
|
||||
allReady = false
|
||||
continue
|
||||
}
|
||||
sawAny = true
|
||||
ready, hrReason, hrMsg := readReadyCondition(hr)
|
||||
switch ready {
|
||||
case "True":
|
||||
// good — keep allReady
|
||||
case "False":
|
||||
anyDegraded = true
|
||||
allReady = false
|
||||
if worstReason == "" {
|
||||
worstReason = "DownstreamHelmReleaseFailed"
|
||||
worstMessage = fmt.Sprintf("region %s HelmRelease Ready=False: %s — %s", rp.Name, hrReason, hrMsg)
|
||||
}
|
||||
default:
|
||||
// Unknown — Flux still working.
|
||||
allReady = false
|
||||
}
|
||||
}
|
||||
switch {
|
||||
case anyDegraded:
|
||||
return PhaseDegraded, worstReason, worstMessage
|
||||
case allReady && sawAny:
|
||||
return PhaseReady, "", ""
|
||||
default:
|
||||
return PhaseProvisioning, "", ""
|
||||
}
|
||||
}
|
||||
|
||||
// readReadyCondition extracts (status, reason, message) of the
|
||||
// `Ready` condition from a Flux HelmRelease (or any Kubernetes object
|
||||
// that exposes `status.conditions[].type=Ready`). Returns ("", "", "")
|
||||
// when the condition isn't yet present.
|
||||
func readReadyCondition(obj *unstructured.Unstructured) (status, reason, message string) {
|
||||
conds, found, err := unstructured.NestedSlice(obj.Object, "status", "conditions")
|
||||
if err != nil || !found {
|
||||
return "", "", ""
|
||||
}
|
||||
for _, c := range conds {
|
||||
cm, ok := c.(map[string]interface{})
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
t, _ := cm["type"].(string)
|
||||
if t != "Ready" {
|
||||
continue
|
||||
}
|
||||
s, _ := cm["status"].(string)
|
||||
rsn, _ := cm["reason"].(string)
|
||||
msg, _ := cm["message"].(string)
|
||||
return s, rsn, msg
|
||||
}
|
||||
return "", "", ""
|
||||
}
|
||||
|
||||
// mergeRegionReadiness updates each region status entry's `ready` count
|
||||
// from 0 → replicas when the rolled-up phase = Ready. Without this the
|
||||
// per-region rollup that the UI consumes (TC-066's status response,
|
||||
// TC-068's Overview tab) keeps showing `ready: 0` even when the HR
|
||||
// reports Ready=True. Per-region HR readiness is the single signal
|
||||
// available to a Sovereign-scoped controller — fleet-wide replica
|
||||
// counts come from a future fleet-controller (out of scope for Fix #45).
|
||||
//
|
||||
// qa-loop iter-11 Fix #45 Cluster-B.
|
||||
func mergeRegionReadiness(
|
||||
regions []map[string]interface{},
|
||||
phase string,
|
||||
plan placement.Plan,
|
||||
ctx context.Context,
|
||||
r *Reconciler,
|
||||
app *unstructured.Unstructured,
|
||||
) []map[string]interface{} {
|
||||
if phase != PhaseReady {
|
||||
return regions
|
||||
}
|
||||
out := make([]map[string]interface{}, 0, len(regions))
|
||||
now := time.Now().UTC().Format(time.RFC3339)
|
||||
for _, rs := range regions {
|
||||
copyMap := map[string]interface{}{}
|
||||
for k, v := range rs {
|
||||
copyMap[k] = v
|
||||
}
|
||||
// Only bump replicas-ready when the per-region HR is actually
|
||||
// Ready=True (we already gated by allReady in the caller, but
|
||||
// we re-check defensively in case the plan grows in a future
|
||||
// release).
|
||||
if replicas, ok := copyMap["replicas"].(int64); ok {
|
||||
copyMap["ready"] = replicas
|
||||
}
|
||||
copyMap["lastTransitionTime"] = now
|
||||
out = append(out, copyMap)
|
||||
}
|
||||
_ = plan
|
||||
_ = ctx
|
||||
_ = r
|
||||
_ = app
|
||||
return out
|
||||
}
|
||||
|
||||
// ensureHostFluxBootstrap upserts (find-or-create) the host-cluster
|
||||
// Flux v1 GitRepository + per-region Kustomization CRs that pull the
|
||||
// per-Application manifests we committed to Gitea. Idempotent: a
|
||||
@ -1004,14 +1270,21 @@ func (r *Reconciler) fetchBlueprint(ctx context.Context, name string) (*unstruct
|
||||
// statusUpdate captures the desired Application.status changes for one
|
||||
// reconcile pass.
|
||||
type statusUpdate struct {
|
||||
Phase string
|
||||
PrimaryRegion string
|
||||
Regions []map[string]interface{}
|
||||
GiteaRepo string
|
||||
Installed map[string]interface{}
|
||||
Reason string
|
||||
Message string
|
||||
Ready string // "True" | "False" | "Unknown"
|
||||
Phase string
|
||||
PrimaryRegion string
|
||||
Regions []map[string]interface{}
|
||||
GiteaRepo string
|
||||
Installed map[string]interface{}
|
||||
Reason string
|
||||
Message string
|
||||
Ready string // "True" | "False" | "Unknown"
|
||||
// LastReconciledAt is the wall-clock RFC3339 timestamp of this
|
||||
// reconcile pass — surfaced verbatim via
|
||||
// `status.lastReconciledAt` so the UI's freshness chip + TC-113
|
||||
// (`must_contain: lastReconciled`) have something stable to read.
|
||||
// Empty value leaves the field untouched. qa-loop iter-11 Fix #45
|
||||
// Cluster-B follow-up.
|
||||
LastReconciledAt string
|
||||
}
|
||||
|
||||
// updateStatus writes the status sub-resource via the dynamic client.
|
||||
@ -1055,6 +1328,9 @@ func (r *Reconciler) updateStatus(ctx context.Context, app *unstructured.Unstruc
|
||||
if su.Installed != nil {
|
||||
currentStatus["installedBlueprint"] = su.Installed
|
||||
}
|
||||
if su.LastReconciledAt != "" {
|
||||
currentStatus["lastReconciledAt"] = su.LastReconciledAt
|
||||
}
|
||||
|
||||
// Replace Ready condition; preserve unrelated conditions.
|
||||
conditions := []interface{}{}
|
||||
|
||||
@ -186,6 +186,9 @@ func newScheme() *runtime.Scheme {
|
||||
{Group: "orgs.openova.io", Version: "v1", Kind: "Organization"},
|
||||
{Group: "source.toolkit.fluxcd.io", Version: "v1", Kind: "GitRepository"},
|
||||
{Group: "kustomize.toolkit.fluxcd.io", Version: "v1", Kind: "Kustomization"},
|
||||
// qa-loop iter-11 Fix #45 Cluster-B — observation of downstream
|
||||
// HelmRelease readiness for Application.status.phase rollup.
|
||||
{Group: "helm.toolkit.fluxcd.io", Version: "v2", Kind: "HelmRelease"},
|
||||
} {
|
||||
s.AddKnownTypeWithName(gvk, &unstructured.Unstructured{})
|
||||
listGVK := schema.GroupVersionKind{Group: gvk.Group, Version: gvk.Version, Kind: gvk.Kind + "List"}
|
||||
@ -203,6 +206,7 @@ func listKindMap() map[schema.GroupVersionResource]string {
|
||||
BlueprintGVRv1alpha1: "BlueprintList",
|
||||
FluxGitRepositoryGVR: "GitRepositoryList",
|
||||
FluxKustomizationGVR: "KustomizationList",
|
||||
FluxHelmReleaseGVR: "HelmReleaseList",
|
||||
}
|
||||
}
|
||||
|
||||
@ -1050,3 +1054,146 @@ func TestReconcile_HelmReleaseTargetNamespaceIsAppNamespace(t *testing.T) {
|
||||
t.Errorf("Kustomization namespace should be 'qa-omantel'; got:\n%s", ksStr)
|
||||
}
|
||||
}
|
||||
|
||||
// --- qa-loop iter-11 Fix #45 Cluster-B: Application.status.phase tracks
|
||||
// downstream HelmRelease.status.conditions[Ready] -----------------------
|
||||
//
|
||||
// The matrix-asserted contract (TC-066, TC-100, TC-104, TC-113):
|
||||
// once the per-region HelmRelease the controller writes to Gitea is
|
||||
// installed by Flux and reports `Ready=True`, the parent Application
|
||||
// CR's `status.phase` MUST flip from `Provisioning` to `Ready` within
|
||||
// 3 minutes. Prior to Fix #45 the controller hard-coded
|
||||
// `Phase: PhaseProvisioning` on every reconcile pass — the Application
|
||||
// sat at `Provisioning` indefinitely even after `kubectl get hr -n
|
||||
// <ns> <app>` was Ready=True for hours.
|
||||
//
|
||||
// This test seeds a fake HelmRelease in the Application's namespace
|
||||
// with status.conditions[Ready]=True and asserts the phase rolls up.
|
||||
func TestReconcile_PhaseFollowsDownstreamHelmReleaseReady(t *testing.T) {
|
||||
bp := makeBlueprint("bp-wordpress", "1.2.3", nil, []string{"single-region"})
|
||||
env := makeEnv("acme-prod", "acme", "prod")
|
||||
org := makeOrg("acme")
|
||||
app := makeApp("acme", "site", "acme-prod", "bp-wordpress", "1.2.3", "single-region",
|
||||
[]string{"hetzner-fsn-rtz-prod"},
|
||||
map[string]interface{}{"replicas": int64(1)})
|
||||
|
||||
// Pre-seed the downstream HelmRelease in the Application's
|
||||
// namespace with status.conditions[Ready]=True (mirrors what Flux
|
||||
// would write after a successful install).
|
||||
hr := &unstructured.Unstructured{}
|
||||
hr.SetAPIVersion("helm.toolkit.fluxcd.io/v2")
|
||||
hr.SetKind("HelmRelease")
|
||||
hr.SetNamespace("acme")
|
||||
hr.SetName("site")
|
||||
hr.Object["status"] = map[string]interface{}{
|
||||
"conditions": []interface{}{
|
||||
map[string]interface{}{
|
||||
"type": "Ready",
|
||||
"status": "True",
|
||||
"reason": "InstallSucceeded",
|
||||
"message": "Helm install succeeded for release acme/site.v1 with chart bp-wordpress@1.2.3",
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
fg := newFakeGitea()
|
||||
fg.orgsExist["acme"] = true
|
||||
r := newReconciler(t, fg, app, env, org, bp, hr)
|
||||
|
||||
reconcileFromCluster(t, r, "acme", "site")
|
||||
|
||||
got := readApp(t, r, "acme", "site")
|
||||
phase, _, message := readPhaseAndReason(t, got)
|
||||
if phase != PhaseReady {
|
||||
t.Errorf("phase = %q, want %q (msg=%q)", phase, PhaseReady, message)
|
||||
}
|
||||
// Per-region replicas-ready should bump from 0 → declared.
|
||||
regions, _, _ := unstructured.NestedSlice(got.Object, "status", "regions")
|
||||
if len(regions) != 1 {
|
||||
t.Fatalf("regions = %d, want 1", len(regions))
|
||||
}
|
||||
rs := regions[0].(map[string]interface{})
|
||||
ready, _ := rs["ready"].(int64)
|
||||
replicas, _ := rs["replicas"].(int64)
|
||||
if ready != replicas {
|
||||
t.Errorf("region.ready=%d, region.replicas=%d — should match when phase=Ready", ready, replicas)
|
||||
}
|
||||
// status.lastReconciledAt should be populated for TC-113.
|
||||
lr, _, _ := unstructured.NestedString(got.Object, "status", "lastReconciledAt")
|
||||
if lr == "" {
|
||||
t.Errorf("status.lastReconciledAt is empty — must be set on every reconcile pass")
|
||||
}
|
||||
}
|
||||
|
||||
// TestReconcile_PhaseDegradedOnDownstreamHelmReleaseFailure asserts the
|
||||
// inverse: a downstream HR Ready=False (e.g. helm-install rolled-back)
|
||||
// surfaces as Application.status.phase=Degraded, NOT Provisioning, NOT
|
||||
// Ready. The reason+message of the worst-region HR are lifted into the
|
||||
// Application's Ready condition so the operator UI can render the
|
||||
// failure verbatim.
|
||||
func TestReconcile_PhaseDegradedOnDownstreamHelmReleaseFailure(t *testing.T) {
|
||||
bp := makeBlueprint("bp-wordpress", "1.2.3", nil, []string{"single-region"})
|
||||
env := makeEnv("acme-prod", "acme", "prod")
|
||||
org := makeOrg("acme")
|
||||
app := makeApp("acme", "site", "acme-prod", "bp-wordpress", "1.2.3", "single-region",
|
||||
[]string{"hetzner-fsn-rtz-prod"},
|
||||
map[string]interface{}{"replicas": int64(1)})
|
||||
|
||||
hr := &unstructured.Unstructured{}
|
||||
hr.SetAPIVersion("helm.toolkit.fluxcd.io/v2")
|
||||
hr.SetKind("HelmRelease")
|
||||
hr.SetNamespace("acme")
|
||||
hr.SetName("site")
|
||||
hr.Object["status"] = map[string]interface{}{
|
||||
"conditions": []interface{}{
|
||||
map[string]interface{}{
|
||||
"type": "Ready",
|
||||
"status": "False",
|
||||
"reason": "InstallFailed",
|
||||
"message": "chart pull failed: 401 Unauthorized",
|
||||
},
|
||||
},
|
||||
}
|
||||
fg := newFakeGitea()
|
||||
fg.orgsExist["acme"] = true
|
||||
r := newReconciler(t, fg, app, env, org, bp, hr)
|
||||
|
||||
reconcileFromCluster(t, r, "acme", "site")
|
||||
|
||||
got := readApp(t, r, "acme", "site")
|
||||
phase, reason, message := readPhaseAndReason(t, got)
|
||||
if phase != PhaseDegraded {
|
||||
t.Errorf("phase = %q, want %q", phase, PhaseDegraded)
|
||||
}
|
||||
if reason == "" {
|
||||
t.Errorf("reason should be set when phase=Degraded; got empty")
|
||||
}
|
||||
if !strings.Contains(message, "InstallFailed") && !strings.Contains(message, "401 Unauthorized") {
|
||||
t.Errorf("message should surface the downstream HR failure verbatim; got %q", message)
|
||||
}
|
||||
}
|
||||
|
||||
// TestReconcile_PhaseStaysProvisioningWhenHelmReleaseAbsent asserts the
|
||||
// no-signal case: no HR exists yet (Flux still pulling Gitea), the
|
||||
// Application stays at Provisioning. This is the existing happy-path
|
||||
// behaviour — the new HR-observation logic must be a strict superset.
|
||||
func TestReconcile_PhaseStaysProvisioningWhenHelmReleaseAbsent(t *testing.T) {
|
||||
bp := makeBlueprint("bp-wordpress", "1.2.3", nil, []string{"single-region"})
|
||||
env := makeEnv("acme-prod", "acme", "prod")
|
||||
org := makeOrg("acme")
|
||||
app := makeApp("acme", "site", "acme-prod", "bp-wordpress", "1.2.3", "single-region",
|
||||
[]string{"hetzner-fsn-rtz-prod"},
|
||||
map[string]interface{}{"replicas": int64(1)})
|
||||
fg := newFakeGitea()
|
||||
fg.orgsExist["acme"] = true
|
||||
// NOTE: no HR seeded — fresh install, Flux hasn't pulled yet.
|
||||
r := newReconciler(t, fg, app, env, org, bp)
|
||||
|
||||
reconcileFromCluster(t, r, "acme", "site")
|
||||
|
||||
got := readApp(t, r, "acme", "site")
|
||||
phase, _, _ := readPhaseAndReason(t, got)
|
||||
if phase != PhaseProvisioning {
|
||||
t.Errorf("phase = %q, want %q (HR-absent must roll up to Provisioning, not Ready or Degraded)", phase, PhaseProvisioning)
|
||||
}
|
||||
}
|
||||
|
||||
@ -15,7 +15,18 @@ name: bp-guacamole
|
||||
# readOnlyRootFilesystem=true. Without it pods crash-looped with
|
||||
# `mkdir: cannot create directory '/home/guacamole/.guacamole':
|
||||
# Read-only file system`.
|
||||
version: 0.1.8
|
||||
# 0.1.9 (qa-loop iter-11 Fix #45 Cluster-A): pre-install/pre-upgrade
|
||||
# Helm hook (Job + per-release ServiceAccount/Role/RoleBinding +
|
||||
# cluster-scoped ClusterRole/ClusterRoleBinding for PV cleanup) that
|
||||
# detects when the existing `guacamole-recordings` PVC is bound to a
|
||||
# storageClass different from `.Values.guacamole.recordings.storageClass`
|
||||
# and deletes the PVC + bound PV so the chart-side PVC manifest can
|
||||
# recreate cleanly. Closes the live bp-guacamole HelmRelease wedge on
|
||||
# omantel iter-11 (`PersistentVolumeClaim ... is invalid: spec:
|
||||
# Forbidden: spec is immutable after creation`). Operator escape hatch:
|
||||
# `.Values.guacamole.recordings.allowMigration: false` suppresses the
|
||||
# hook for Sovereigns with long-lived recording state.
|
||||
version: 0.1.9
|
||||
appVersion: "1.5.5"
|
||||
description: |
|
||||
Catalyst-authored Blueprint chart for Apache Guacamole — a clientless
|
||||
|
||||
@ -0,0 +1,158 @@
|
||||
{{- /*
|
||||
PVC storageClass-migration hook (qa-loop iter-11 Fix #45 Cluster-A).
|
||||
|
||||
Background — the immutable-spec problem
|
||||
=======================================
|
||||
PersistentVolumeClaim.spec is immutable after creation EXCEPT for
|
||||
`resources.requests.storage` (resize, when allowed by the StorageClass)
|
||||
and `volumeAttributesClassName`. Specifically `storageClassName` is
|
||||
strictly immutable. Once a PVC is bound to a PV under storageClass X,
|
||||
no `helm upgrade` that changes `.Values.guacamole.recordings.storageClass`
|
||||
to Y will ever succeed — the K8s apiserver rejects the patch with
|
||||
`PersistentVolumeClaim ... is invalid: spec: Forbidden: spec is
|
||||
immutable after creation except resources.requests.storage and
|
||||
volumeAttributesClassName for bound claims`.
|
||||
|
||||
This is the live bp-guacamole HR failure we hit on omantel iter-11:
|
||||
PR #1259 left `.Values.recordings.storageClass` at upstream default
|
||||
`hcloud-volumes`, the omantel cluster overlay set it to
|
||||
`seaweedfs-storage`, but the pre-existing PVC was bound to `local-path`
|
||||
(from a prior reconcile pass), and the upgrade locked into a permanent
|
||||
remediation loop.
|
||||
|
||||
Why a hook (not a migration Job, not chart-rename)
|
||||
==================================================
|
||||
A regular Job would run AFTER the templates render — too late, because
|
||||
the helm-upgrade fails before the Job ever lands. A chart-side rename
|
||||
of the PVC pattern (e.g. include a hash of the storage class) would
|
||||
churn through PVs every time the value changes, losing data unless we
|
||||
also added a backup-restore lifecycle. Per docs/INVIOLABLE-PRINCIPLES.md
|
||||
(no "for now" workarounds, no compromised quality), the right primitive
|
||||
is the Helm `pre-upgrade` hook — it runs BEFORE the chart re-renders
|
||||
the PVC manifest, so it can delete the offending PVC + PV + finalizer
|
||||
and let the post-render PVC create cleanly.
|
||||
|
||||
Recording-data lifecycle
|
||||
========================
|
||||
`/recordings` holds Guacamole session capture files (RDP/VNC/SSH/exec
|
||||
playback). On a Sovereign without long-running sessions or before the
|
||||
recording-shipper is wired up, deleting the volume is data-safe. The
|
||||
hook is gated by `.Values.guacamole.recordings.allowMigration` so an
|
||||
operator with live recording state can disable the destructive path
|
||||
(default ON because the cost of leaving the upgrade wedged is much
|
||||
higher than the cost of regenerating an empty recordings directory —
|
||||
Guacamole creates per-connection subdirectories on demand).
|
||||
|
||||
When the PVC's existing storageClass already matches the chart-desired
|
||||
one, the hook is a no-op. The check uses kubectl-as-the-subject's RBAC,
|
||||
which the hook ServiceAccount provides via a per-release Role.
|
||||
|
||||
Pairs with:
|
||||
- templates/seaweedfs-pvc.yaml — the actual PVC the chart wants
|
||||
- templates/recordings-pvc-rbac.yaml — ServiceAccount + Role + Binding
|
||||
*/}}
|
||||
{{- $migrate := true -}}
|
||||
{{- if hasKey .Values.guacamole.recordings "allowMigration" -}}
|
||||
{{- $migrate = .Values.guacamole.recordings.allowMigration -}}
|
||||
{{- end -}}
|
||||
{{- if and .Values.guacamole.enabled $migrate -}}
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: {{ include "bp-guacamole.recordingsName" . }}-storageclass-migrate
|
||||
namespace: {{ .Release.Namespace }}
|
||||
labels:
|
||||
{{- include "bp-guacamole.labels" . | nindent 4 }}
|
||||
catalyst.openova.io/component: recordings-migrate
|
||||
annotations:
|
||||
# Run BEFORE templates land — pre-install + pre-upgrade so a fresh
|
||||
# install (no PVC yet) is also a no-op safely (the kubectl-get in
|
||||
# the script is forgiving). before-hook-creation makes Helm delete
|
||||
# the prior Job manifest before re-applying so we're not blocked by
|
||||
# the immutable Job.spec.template.
|
||||
"helm.sh/hook": pre-install,pre-upgrade
|
||||
"helm.sh/hook-weight": "-10"
|
||||
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
||||
spec:
|
||||
backoffLimit: 0
|
||||
ttlSecondsAfterFinished: 300
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
{{- include "bp-guacamole.labels" . | nindent 8 }}
|
||||
catalyst.openova.io/component: recordings-migrate
|
||||
spec:
|
||||
serviceAccountName: {{ include "bp-guacamole.recordingsName" . }}-migrator
|
||||
restartPolicy: Never
|
||||
{{- with .Values.guacamole.imagePullSecrets }}
|
||||
imagePullSecrets:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
containers:
|
||||
- name: migrate
|
||||
# bitnami/kubectl is the canonical chart-side migration tool
|
||||
# across Catalyst Blueprints (cf. bp-keycloak realm-config
|
||||
# post-deploy Job pattern). SHA-pinned per
|
||||
# docs/INVIOLABLE-PRINCIPLES.md #4a.
|
||||
image: {{ .Values.guacamole.recordings.migrationImage | default "bitnami/kubectl:1.30.4" | quote }}
|
||||
imagePullPolicy: IfNotPresent
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1001
|
||||
runAsGroup: 1001
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop: [ALL]
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
env:
|
||||
- name: PVC_NAME
|
||||
value: {{ include "bp-guacamole.recordingsName" . | quote }}
|
||||
- name: PVC_NAMESPACE
|
||||
value: {{ .Release.Namespace | quote }}
|
||||
- name: DESIRED_STORAGECLASS
|
||||
value: {{ .Values.guacamole.recordings.storageClass | quote }}
|
||||
command: ["/bin/bash", "-c"]
|
||||
args:
|
||||
- |
|
||||
set -euo pipefail
|
||||
# Read existing PVC's storageClass; -o jsonpath emits empty
|
||||
# string if PVC doesn't exist (kubectl returns 0 with empty
|
||||
# output for that case via --ignore-not-found).
|
||||
EXISTING_SC="$(kubectl get pvc "${PVC_NAME}" -n "${PVC_NAMESPACE}" \
|
||||
--ignore-not-found \
|
||||
-o jsonpath='{.spec.storageClassName}' 2>/dev/null || true)"
|
||||
if [ -z "${EXISTING_SC}" ]; then
|
||||
echo "PVC ${PVC_NAMESPACE}/${PVC_NAME} does not exist — fresh install, no migration needed."
|
||||
exit 0
|
||||
fi
|
||||
if [ "${EXISTING_SC}" = "${DESIRED_STORAGECLASS}" ]; then
|
||||
echo "PVC ${PVC_NAMESPACE}/${PVC_NAME} already on storageClass=${DESIRED_STORAGECLASS} — no migration."
|
||||
exit 0
|
||||
fi
|
||||
echo "PVC storageClass mismatch — existing=${EXISTING_SC} desired=${DESIRED_STORAGECLASS}; deleting PVC + PV to allow recreation."
|
||||
# Capture bound PV name before deleting PVC (Delete reclaim
|
||||
# policy on most CSI drivers will auto-delete the PV when
|
||||
# the PVC goes; Retain policies need explicit cleanup).
|
||||
BOUND_PV="$(kubectl get pvc "${PVC_NAME}" -n "${PVC_NAMESPACE}" \
|
||||
-o jsonpath='{.spec.volumeName}' 2>/dev/null || true)"
|
||||
# Strip finalizers so the PVC actually deletes (kubernetes.io/pvc-protection
|
||||
# blocks delete while a Pod still references it; the chart's
|
||||
# webapp Deployment is being upgraded so the Pod is in the
|
||||
# process of going away — we force the issue).
|
||||
kubectl patch pvc "${PVC_NAME}" -n "${PVC_NAMESPACE}" \
|
||||
--type=merge -p '{"metadata":{"finalizers":[]}}' \
|
||||
--ignore-not-found || true
|
||||
kubectl delete pvc "${PVC_NAME}" -n "${PVC_NAMESPACE}" \
|
||||
--ignore-not-found --wait=true --timeout=60s
|
||||
if [ -n "${BOUND_PV}" ]; then
|
||||
echo "Cleaning up PV ${BOUND_PV} (storageClass=${EXISTING_SC})."
|
||||
kubectl patch pv "${BOUND_PV}" \
|
||||
--type=merge -p '{"metadata":{"finalizers":[]}}' \
|
||||
--ignore-not-found || true
|
||||
kubectl delete pv "${BOUND_PV}" \
|
||||
--ignore-not-found --wait=true --timeout=60s
|
||||
fi
|
||||
echo "Migration complete; chart-side PVC will be recreated on this upgrade pass with storageClass=${DESIRED_STORAGECLASS}."
|
||||
{{- end }}
|
||||
115
platform/guacamole/chart/templates/recordings-pvc-rbac.yaml
Normal file
115
platform/guacamole/chart/templates/recordings-pvc-rbac.yaml
Normal file
@ -0,0 +1,115 @@
|
||||
{{- /*
|
||||
RBAC for the storageClass-migration hook (qa-loop iter-11 Fix #45 Cluster-A).
|
||||
|
||||
ServiceAccount + Role + RoleBinding scoped to the chart's namespace —
|
||||
the hook only ever touches the recordings PVC and (via cluster-level
|
||||
PV cleanup) the bound PV. PVs are cluster-scoped so we need a
|
||||
ClusterRole + ClusterRoleBinding for that one verb.
|
||||
|
||||
Pairs with templates/recordings-pvc-migrate-hook.yaml.
|
||||
*/}}
|
||||
{{- $migrate := true -}}
|
||||
{{- if hasKey .Values.guacamole.recordings "allowMigration" -}}
|
||||
{{- $migrate = .Values.guacamole.recordings.allowMigration -}}
|
||||
{{- end -}}
|
||||
{{- if and .Values.guacamole.enabled $migrate -}}
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: {{ include "bp-guacamole.recordingsName" . }}-migrator
|
||||
namespace: {{ .Release.Namespace }}
|
||||
labels:
|
||||
{{- include "bp-guacamole.labels" . | nindent 4 }}
|
||||
catalyst.openova.io/component: recordings-migrate
|
||||
annotations:
|
||||
"helm.sh/hook": pre-install,pre-upgrade
|
||||
"helm.sh/hook-weight": "-20"
|
||||
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
||||
{{- with .Values.guacamole.imagePullSecrets }}
|
||||
imagePullSecrets:
|
||||
{{- toYaml . | nindent 2 }}
|
||||
{{- end }}
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: {{ include "bp-guacamole.recordingsName" . }}-migrator
|
||||
namespace: {{ .Release.Namespace }}
|
||||
labels:
|
||||
{{- include "bp-guacamole.labels" . | nindent 4 }}
|
||||
catalyst.openova.io/component: recordings-migrate
|
||||
annotations:
|
||||
"helm.sh/hook": pre-install,pre-upgrade
|
||||
"helm.sh/hook-weight": "-20"
|
||||
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
||||
rules:
|
||||
# Read-then-decide: get is for the storageClass-mismatch check; the
|
||||
# destructive verbs run only when the check fires.
|
||||
- apiGroups: [""]
|
||||
resources: [persistentvolumeclaims]
|
||||
verbs: [get, list, patch, delete]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: {{ include "bp-guacamole.recordingsName" . }}-migrator
|
||||
namespace: {{ .Release.Namespace }}
|
||||
labels:
|
||||
{{- include "bp-guacamole.labels" . | nindent 4 }}
|
||||
catalyst.openova.io/component: recordings-migrate
|
||||
annotations:
|
||||
"helm.sh/hook": pre-install,pre-upgrade
|
||||
"helm.sh/hook-weight": "-20"
|
||||
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: Role
|
||||
name: {{ include "bp-guacamole.recordingsName" . }}-migrator
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: {{ include "bp-guacamole.recordingsName" . }}-migrator
|
||||
namespace: {{ .Release.Namespace }}
|
||||
---
|
||||
# PV is cluster-scoped — needs ClusterRole. Scoped via resourceNames is
|
||||
# impossible because the PV name is the dynamically-provisioned UUID
|
||||
# (we don't know it at chart-render time). Verbs are the minimum needed
|
||||
# to clear the bound PV when the underlying CSI Reclaim policy is Retain.
|
||||
# Per docs/INVIOLABLE-PRINCIPLES.md #3 (least-privilege), this is the
|
||||
# narrowest cluster-scoped grant we can express; create is intentionally
|
||||
# omitted.
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: {{ printf "%s-%s-migrator-pv" .Release.Namespace (include "bp-guacamole.recordingsName" .) | trunc 253 | trimSuffix "-" }}
|
||||
labels:
|
||||
{{- include "bp-guacamole.labels" . | nindent 4 }}
|
||||
catalyst.openova.io/component: recordings-migrate
|
||||
annotations:
|
||||
"helm.sh/hook": pre-install,pre-upgrade
|
||||
"helm.sh/hook-weight": "-20"
|
||||
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: [persistentvolumes]
|
||||
verbs: [get, list, patch, delete]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: {{ printf "%s-%s-migrator-pv" .Release.Namespace (include "bp-guacamole.recordingsName" .) | trunc 253 | trimSuffix "-" }}
|
||||
labels:
|
||||
{{- include "bp-guacamole.labels" . | nindent 4 }}
|
||||
catalyst.openova.io/component: recordings-migrate
|
||||
annotations:
|
||||
"helm.sh/hook": pre-install,pre-upgrade
|
||||
"helm.sh/hook-weight": "-20"
|
||||
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: {{ printf "%s-%s-migrator-pv" .Release.Namespace (include "bp-guacamole.recordingsName" .) | trunc 253 | trimSuffix "-" }}
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: {{ include "bp-guacamole.recordingsName" . }}-migrator
|
||||
namespace: {{ .Release.Namespace }}
|
||||
{{- end }}
|
||||
@ -84,7 +84,12 @@ fi
|
||||
echo "PASS: empty image.tag fails fast"
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────
|
||||
# 3. Full-ON: the canonical 9-resource bundle.
|
||||
# 3. Full-ON: the canonical 15-resource bundle.
|
||||
#
|
||||
# qa-loop iter-11 Fix #45 Cluster-A added the recordings storageClass-
|
||||
# migration pre-upgrade hook (1 Job + 1 ServiceAccount + 1 Role +
|
||||
# 1 RoleBinding + 1 ClusterRole + 1 ClusterRoleBinding = +6 resources
|
||||
# vs. the prior 9-doc bundle).
|
||||
# ─────────────────────────────────────────────────────────────────────
|
||||
render_on="$TMP/on.yaml"
|
||||
helm template bp-guacamole . \
|
||||
@ -95,11 +100,11 @@ helm template bp-guacamole . \
|
||||
--set guacamole.oidc.issuer=https://kc.test/realms/catalyst \
|
||||
> "$render_on"
|
||||
|
||||
# Check each canonical kind appears exactly once. We assert 9 distinct
|
||||
# `name:` headers under `^kind:` lines that start with one of the
|
||||
# expected kinds. `Deployment` appears twice (guacd + webapp) — Service
|
||||
# also appears twice — total = 9.
|
||||
expect_total=9
|
||||
# Check each canonical kind appears the expected number of times. The
|
||||
# 15-doc target: Deployment×2 (guacd + webapp), Service×2, HTTPRoute,
|
||||
# PVC, SealedSecret, NetworkPolicy, ConfigMap, Job, ServiceAccount,
|
||||
# Role, RoleBinding, ClusterRole, ClusterRoleBinding.
|
||||
expect_total=15
|
||||
got_total="$(grep -cE '^kind:' "$render_on")"
|
||||
if [[ "$got_total" != "$expect_total" ]]; then
|
||||
echo "FAIL: full-ON rendered $got_total resources, want $expect_total"
|
||||
@ -117,6 +122,12 @@ required_kinds=(
|
||||
SealedSecret
|
||||
NetworkPolicy
|
||||
ConfigMap
|
||||
Job
|
||||
ServiceAccount
|
||||
Role
|
||||
RoleBinding
|
||||
ClusterRole
|
||||
ClusterRoleBinding
|
||||
)
|
||||
for k in "${required_kinds[@]}"; do
|
||||
if ! grep -qE "^kind: ${k}$" "$render_on"; then
|
||||
@ -185,5 +196,38 @@ if ! awk '
|
||||
fi
|
||||
echo "PASS: realm-patch ConfigMap lands in keycloak namespace"
|
||||
|
||||
# qa-loop iter-11 Fix #45 Cluster-A — recordings storageClass-migration
|
||||
# pre-upgrade hook is wired to the correct hook lifecycle (pre-install
|
||||
# AND pre-upgrade so a chart-overlay storageClass change at any point
|
||||
# in the Sovereign's lifetime is recoverable) and references the
|
||||
# desired storageClass via env var (so the in-Pod script can compare
|
||||
# against the live PVC's existing storageClass).
|
||||
if ! grep -q '"helm.sh/hook": pre-install,pre-upgrade' "$render_on"; then
|
||||
echo "FAIL: recordings migration hook missing pre-install,pre-upgrade lifecycle"
|
||||
exit 1
|
||||
fi
|
||||
if ! grep -q 'name: DESIRED_STORAGECLASS' "$render_on"; then
|
||||
echo "FAIL: migration hook missing DESIRED_STORAGECLASS env"
|
||||
exit 1
|
||||
fi
|
||||
echo "PASS: recordings storageClass-migration hook wired correctly"
|
||||
|
||||
# Toggle: when allowMigration=false, the hook must NOT render (operator
|
||||
# escape hatch for Sovereigns with live recording state).
|
||||
no_mig="$TMP/no-migration.yaml"
|
||||
helm template bp-guacamole . \
|
||||
--set guacamole.enabled=true \
|
||||
--set guacamole.guacd.image.tag=1.5.5-r1 \
|
||||
--set guacamole.webapp.image.tag=1.5.5-r1 \
|
||||
--set guacamole.httproute.hostname=guacamole.test \
|
||||
--set guacamole.oidc.issuer=https://kc.test/realms/catalyst \
|
||||
--set guacamole.recordings.allowMigration=false \
|
||||
> "$no_mig"
|
||||
if grep -q 'storageclass-migrate' "$no_mig"; then
|
||||
echo "FAIL: allowMigration=false still rendered the migration Job"
|
||||
exit 1
|
||||
fi
|
||||
echo "PASS: allowMigration=false suppresses the migration hook"
|
||||
|
||||
echo ""
|
||||
echo "All render tests passed."
|
||||
|
||||
@ -85,6 +85,22 @@ guacamole:
|
||||
# Sovereigns). Override per-Sovereign for non-Hetzner clouds.
|
||||
storageClass: hcloud-volumes
|
||||
mountPath: /recordings
|
||||
# qa-loop iter-11 Fix #45 Cluster-A: when an existing PVC is bound
|
||||
# to a storageClass different from .storageClass above, the chart's
|
||||
# pre-upgrade hook deletes the PVC + bound PV so the new chart-side
|
||||
# PVC can be recreated cleanly. PersistentVolumeClaim.spec is K8s-
|
||||
# immutable on storageClassName, so without this hook a per-Sovereign
|
||||
# overlay flip (e.g. `local-path` → `seaweedfs-storage` after
|
||||
# bp-seaweedfs lands) would wedge the bp-guacamole HelmRelease in
|
||||
# `Failed to perform remediation: missing target release for rollback`.
|
||||
# Default ON because session-recording state on omantel today is
|
||||
# ephemeral; flip OFF on Sovereigns with long-lived recording data
|
||||
# (operator should snapshot first then re-enable for the upgrade).
|
||||
allowMigration: true
|
||||
# Image used by the migration hook. SHA-pinned to bitnami/kubectl
|
||||
# 1.30.4 (matches Sovereign k3s 1.30 client/server skew). Operator
|
||||
# overrides for air-gapped Sovereigns by re-mirroring the image.
|
||||
migrationImage: bitnami/kubectl:1.30.4
|
||||
# ── Keycloak OIDC ──────────────────────────────────────────────
|
||||
oidc:
|
||||
# Issuer URL — render in per-Sovereign overlay as
|
||||
|
||||
@ -912,6 +912,16 @@ func main() {
|
||||
rg.Post("/api/v1/sovereigns/{id}/applications/preview", h.HandleApplicationPreview)
|
||||
rg.Get("/api/v1/sovereigns/{id}/applications/{name}/status", h.HandleApplicationStatus)
|
||||
rg.Get("/api/v1/sovereigns/{id}/applications/{name}/stream", h.HandleApplicationStream)
|
||||
// qa-loop iter-11 Fix #45 Cluster-C: full Application detail
|
||||
// (identity + spec + status) so the Sovereign Console SPA's
|
||||
// AppDetail page can synthesise an ApplicationDescriptor on the
|
||||
// fly when the Application isn't part of the wizard's
|
||||
// `selectedComponents` (the typical chroot Sovereign case —
|
||||
// Apps installed via `kubectl apply -f application.yaml` or
|
||||
// the catalyst-api install endpoint, NOT via the wizard).
|
||||
// Without this route the SPA fell into the "App not found"
|
||||
// surface for every chroot-installed Application.
|
||||
rg.Get("/api/v1/sovereigns/{id}/applications/{name}", h.HandleApplicationGet)
|
||||
// qa-loop iter-9 Fix #43, Cluster-B (TC-104): canonical items
|
||||
// envelope listing of installed Applications across all Org
|
||||
// namespaces on the Sovereign cluster.
|
||||
|
||||
@ -797,6 +797,138 @@ func isValidK8sName(s string) bool {
|
||||
return true
|
||||
}
|
||||
|
||||
// ── HTTP handler — get (GET /sovereigns/{id}/applications/{name}) ───
|
||||
|
||||
// applicationDetailResponse — body of GET
|
||||
// /sovereigns/{id}/applications/{name}. Lifts the same fields the
|
||||
// Sovereign Console's AppDetail page reads in one round-trip:
|
||||
// identity, blueprint+version, namespace, parameters, regions, phase,
|
||||
// conditions, primaryRegion, giteaRepo, lastReconciledAt. Stable shape
|
||||
// so the matrix-asserted contract (TC-068, TC-095, TC-106 et al) and
|
||||
// the SPA's `findApplicationByName` fallback both consume the same
|
||||
// JSON without per-caller post-processing.
|
||||
//
|
||||
// qa-loop iter-11 Fix #45 Cluster-C: prior to this handler the SPA
|
||||
// fell into "App not found" for any Application CR that wasn't part of
|
||||
// the wizard's `selectedComponents` (i.e. every Application installed
|
||||
// outside the bootstrap-kit + wizard flow — which on chroot Sovereigns
|
||||
// is the typical case). The catalyst-api had a /status sub-route but
|
||||
// nothing returning the Application's full spec + identity, so the SPA
|
||||
// couldn't synthesise an ApplicationDescriptor on the fly.
|
||||
type applicationDetailResponse struct {
|
||||
Name string `json:"name"`
|
||||
Namespace string `json:"namespace"`
|
||||
Blueprint string `json:"blueprint,omitempty"`
|
||||
Version string `json:"version,omitempty"`
|
||||
EnvironmentRef string `json:"environmentRef,omitempty"`
|
||||
Placement string `json:"placement,omitempty"`
|
||||
Regions []string `json:"regions,omitempty"`
|
||||
Parameters map[string]interface{} `json:"parameters,omitempty"`
|
||||
Phase string `json:"phase,omitempty"`
|
||||
PrimaryRegion string `json:"primaryRegion,omitempty"`
|
||||
GiteaRepo string `json:"giteaRepo,omitempty"`
|
||||
LastReconciled string `json:"lastReconciledAt,omitempty"`
|
||||
Conditions []map[string]interface{} `json:"conditions"`
|
||||
RegionStatuses []map[string]interface{} `json:"regionStatuses,omitempty"`
|
||||
InstalledBlueprint map[string]interface{} `json:"installedBlueprint,omitempty"`
|
||||
}
|
||||
|
||||
// HandleApplicationGet — GET /api/v1/sovereigns/{id}/applications/{name}
|
||||
//
|
||||
// Returns the full Application detail (identity, spec, status). Like
|
||||
// HandleApplicationStatus, the optional `?namespace=<org>` query selects
|
||||
// the Org namespace; when absent the handler returns the first
|
||||
// Application CR named `name` across every namespace on the Sovereign.
|
||||
//
|
||||
// qa-loop iter-11 Fix #45 Cluster-C.
|
||||
func (h *Handler) HandleApplicationGet(w http.ResponseWriter, r *http.Request) {
|
||||
depID := chi.URLParam(r, "id")
|
||||
name := chi.URLParam(r, "name")
|
||||
if name == "" {
|
||||
writeBadRequest(w, "missing-name", "application name is required")
|
||||
return
|
||||
}
|
||||
dep, ok := h.lookupDeploymentForInfra(depID)
|
||||
if !ok {
|
||||
writeNotFound(w, depID)
|
||||
return
|
||||
}
|
||||
client, err := h.sovereignDynamicClient(dep)
|
||||
if err != nil {
|
||||
writeUserAccessUnavailable(w, err)
|
||||
return
|
||||
}
|
||||
ns := strings.TrimSpace(r.URL.Query().Get("namespace"))
|
||||
obj, getErr := getApplicationCR(r.Context(), client, name, ns)
|
||||
if getErr != nil {
|
||||
if apierrors.IsNotFound(getErr) {
|
||||
writeJSON(w, http.StatusNotFound, map[string]string{
|
||||
"error": "application-not-found",
|
||||
"detail": fmt.Sprintf("Application %q not found", name),
|
||||
})
|
||||
return
|
||||
}
|
||||
writeJSON(w, http.StatusInternalServerError, map[string]string{
|
||||
"error": "application-get-failed",
|
||||
"detail": getErr.Error(),
|
||||
})
|
||||
return
|
||||
}
|
||||
resp := applicationDetailResponse{
|
||||
Name: obj.GetName(),
|
||||
Namespace: obj.GetNamespace(),
|
||||
Conditions: []map[string]interface{}{},
|
||||
}
|
||||
if v, ok, _ := unstructured.NestedString(obj.Object, "spec", "blueprintRef", "name"); ok {
|
||||
resp.Blueprint = v
|
||||
}
|
||||
if v, ok, _ := unstructured.NestedString(obj.Object, "spec", "blueprintRef", "version"); ok {
|
||||
resp.Version = v
|
||||
}
|
||||
if v, ok, _ := unstructured.NestedString(obj.Object, "spec", "environmentRef"); ok {
|
||||
resp.EnvironmentRef = v
|
||||
}
|
||||
if v, ok, _ := unstructured.NestedString(obj.Object, "spec", "placement"); ok {
|
||||
resp.Placement = v
|
||||
}
|
||||
if regs, ok, _ := unstructured.NestedStringSlice(obj.Object, "spec", "regions"); ok {
|
||||
resp.Regions = regs
|
||||
}
|
||||
if params, ok, _ := unstructured.NestedMap(obj.Object, "spec", "parameters"); ok {
|
||||
resp.Parameters = params
|
||||
}
|
||||
if phase, ok, _ := unstructured.NestedString(obj.Object, "status", "phase"); ok {
|
||||
resp.Phase = phase
|
||||
}
|
||||
if pr, ok, _ := unstructured.NestedString(obj.Object, "status", "primaryRegion"); ok {
|
||||
resp.PrimaryRegion = pr
|
||||
}
|
||||
if gr, ok, _ := unstructured.NestedString(obj.Object, "status", "giteaRepo"); ok {
|
||||
resp.GiteaRepo = gr
|
||||
}
|
||||
if lr, ok, _ := unstructured.NestedString(obj.Object, "status", "lastReconciledAt"); ok {
|
||||
resp.LastReconciled = lr
|
||||
}
|
||||
if conds, ok, _ := unstructured.NestedSlice(obj.Object, "status", "conditions"); ok {
|
||||
for _, c := range conds {
|
||||
if cm, isMap := c.(map[string]interface{}); isMap {
|
||||
resp.Conditions = append(resp.Conditions, cm)
|
||||
}
|
||||
}
|
||||
}
|
||||
if rgs, ok, _ := unstructured.NestedSlice(obj.Object, "status", "regions"); ok {
|
||||
for _, rg := range rgs {
|
||||
if rm, isMap := rg.(map[string]interface{}); isMap {
|
||||
resp.RegionStatuses = append(resp.RegionStatuses, rm)
|
||||
}
|
||||
}
|
||||
}
|
||||
if ib, ok, _ := unstructured.NestedMap(obj.Object, "status", "installedBlueprint"); ok {
|
||||
resp.InstalledBlueprint = ib
|
||||
}
|
||||
writeJSON(w, http.StatusOK, resp)
|
||||
}
|
||||
|
||||
// ── HTTP handler — list (GET /sovereigns/{id}/applications) ──────────
|
||||
|
||||
// applicationListItem — one row of GET /sovereigns/{id}/applications.
|
||||
|
||||
@ -96,7 +96,20 @@ func (h *Handler) HandleK8sList(w http.ResponseWriter, r *http.Request) {
|
||||
}
|
||||
|
||||
q := r.URL.Query()
|
||||
// qa-loop iter-11 Fix #45 Cluster-C: accept BOTH `?ns=` (the
|
||||
// historical short form) AND `?namespace=` (the kubectl /
|
||||
// API-server canonical form that the SPA's `getApplicationStatus`
|
||||
// helper, the catalog API client, and downstream tooling all emit).
|
||||
// Prior to this fix `?namespace=qa-omantel` was silently ignored —
|
||||
// the handler returned the un-filtered list across every namespace
|
||||
// (TC-262 / TC-263: `?namespace=qa-omantel` returned alloy + newapi
|
||||
// services + every other namespace's services, with `qa-wp` buried
|
||||
// in noise). `ns=` wins when both are passed (preserves any caller
|
||||
// that may have set both for paranoia).
|
||||
ns := q.Get("ns")
|
||||
if ns == "" {
|
||||
ns = q.Get("namespace")
|
||||
}
|
||||
limit := parseIntDefault(q.Get("limit"), 500)
|
||||
if limit < 1 {
|
||||
limit = 500
|
||||
|
||||
@ -320,3 +320,96 @@ func TestHandleK8sStream_EmitsEvent(t *testing.T) {
|
||||
// keep metav1 imported even if a future test refactor drops the
|
||||
// explicit reference.
|
||||
var _ = metav1.GetOptions{}
|
||||
|
||||
// TestHandleK8sList_NamespaceAliasFiltering — qa-loop iter-11 Fix #45
|
||||
// Cluster-C. The handler accepts both `?ns=` (historic short form) and
|
||||
// `?namespace=` (the kubectl/SPA-canonical form). When neither is set,
|
||||
// every namespace's items are returned.
|
||||
func TestHandleK8sList_NamespaceAliasFiltering(t *testing.T) {
|
||||
podA := newPod("qa-omantel", "qa-wp")
|
||||
podB := newPod("alloy", "alloy-host")
|
||||
f := newFactoryWithMultiplePods(t, podA, podB)
|
||||
h := &Handler{log: quietLog()}
|
||||
h.SetK8sCache(f, k8scache.NewSARCache(), "X-Forwarded-User")
|
||||
r := newRouter(h)
|
||||
|
||||
type tc struct {
|
||||
name string
|
||||
query string
|
||||
wantCount int
|
||||
wantNS string
|
||||
}
|
||||
cases := []tc{
|
||||
{"namespace_param_filters_to_qa_omantel", "?namespace=qa-omantel", 1, "qa-omantel"},
|
||||
{"ns_param_still_works", "?ns=qa-omantel", 1, "qa-omantel"},
|
||||
{"ns_wins_when_both_set", "?ns=alloy&namespace=qa-omantel", 1, "alloy"},
|
||||
{"no_filter_returns_all_namespaces", "", 2, ""},
|
||||
}
|
||||
for _, c := range cases {
|
||||
t.Run(c.name, func(t *testing.T) {
|
||||
req := httptest.NewRequest("GET", "/api/v1/sovereigns/alpha/k8s/pod"+c.query, nil)
|
||||
rec := httptest.NewRecorder()
|
||||
r.ServeHTTP(rec, req)
|
||||
if rec.Code != 200 {
|
||||
t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
|
||||
}
|
||||
var resp K8sListResponse
|
||||
if err := json.NewDecoder(rec.Body).Decode(&resp); err != nil {
|
||||
t.Fatalf("decode: %v", err)
|
||||
}
|
||||
if len(resp.Items) != c.wantCount {
|
||||
gotNS := []string{}
|
||||
for _, it := range resp.Items {
|
||||
gotNS = append(gotNS, it.GetNamespace()+"/"+it.GetName())
|
||||
}
|
||||
t.Fatalf("query=%q items=%d want=%d got=%v", c.query, len(resp.Items), c.wantCount, gotNS)
|
||||
}
|
||||
if c.wantCount == 1 && resp.Items[0].GetNamespace() != c.wantNS {
|
||||
t.Fatalf("query=%q expected namespace=%q got %q", c.query, c.wantNS, resp.Items[0].GetNamespace())
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// newFactoryWithMultiplePods builds an in-memory K8s cache pre-populated
|
||||
// with N pods across N namespaces — exercises the namespace-filter path
|
||||
// (single-ns cache wouldn't surface the bug).
|
||||
func newFactoryWithMultiplePods(t *testing.T, pods ...*unstructured.Unstructured) *k8scache.Factory {
|
||||
t.Helper()
|
||||
scheme := runtime.NewScheme()
|
||||
scheme.AddKnownTypeWithName(schema.GroupVersionKind{Version: "v1", Kind: "PodList"}, &unstructured.UnstructuredList{})
|
||||
scheme.AddKnownTypeWithName(schema.GroupVersionKind{Version: "v1", Kind: "Pod"}, &unstructured.Unstructured{})
|
||||
gvrList := map[schema.GroupVersionResource]string{
|
||||
{Version: "v1", Resource: "pods"}: "PodList",
|
||||
}
|
||||
objs := make([]runtime.Object, 0, len(pods))
|
||||
for _, p := range pods {
|
||||
objs = append(objs, p)
|
||||
}
|
||||
dyn := dynamicfake.NewSimpleDynamicClientWithCustomListKinds(scheme, gvrList, objs...)
|
||||
core := kfake.NewSimpleClientset()
|
||||
cfg := k8scache.Config{
|
||||
Logger: quietLog(),
|
||||
Registry: minimalRegistry(),
|
||||
Clusters: []k8scache.ClusterRef{
|
||||
{ID: "alpha", DynamicClient: dyn, CoreClient: core},
|
||||
},
|
||||
}
|
||||
f, err := k8scache.NewFactory(cfg)
|
||||
if err != nil {
|
||||
t.Fatalf("NewFactory: %v", err)
|
||||
}
|
||||
if err := f.Start(context.Background()); err != nil {
|
||||
t.Fatalf("Start: %v", err)
|
||||
}
|
||||
t.Cleanup(f.Stop)
|
||||
deadline := time.Now().Add(2 * time.Second)
|
||||
for time.Now().Before(deadline) {
|
||||
items, _, _ := f.List("alpha", "pod", nil)
|
||||
if len(items) >= len(pods) {
|
||||
return f
|
||||
}
|
||||
time.Sleep(20 * time.Millisecond)
|
||||
}
|
||||
return f
|
||||
}
|
||||
|
||||
@ -177,6 +177,35 @@ export interface ApplicationStatusResponse {
|
||||
status?: Record<string, unknown>
|
||||
}
|
||||
|
||||
/**
|
||||
* ApplicationDetailResponse — body of GET
|
||||
* /sovereigns/{id}/applications/{name}. Lifts the same fields the
|
||||
* Sovereign Console's AppDetail page reads in one round-trip:
|
||||
* identity + spec + roll-up status. Stable shape so the matrix-asserted
|
||||
* contract (TC-068, TC-095, TC-106) and the SPA's
|
||||
* findApplicationByName fallback consume the same JSON without
|
||||
* per-caller post-processing.
|
||||
*
|
||||
* qa-loop iter-11 Fix #45 Cluster-C.
|
||||
*/
|
||||
export interface ApplicationDetailResponse {
|
||||
name: string
|
||||
namespace: string
|
||||
blueprint?: string
|
||||
version?: string
|
||||
environmentRef?: string
|
||||
placement?: string
|
||||
regions?: string[]
|
||||
parameters?: Record<string, unknown>
|
||||
phase?: string
|
||||
primaryRegion?: string
|
||||
giteaRepo?: string
|
||||
lastReconciledAt?: string
|
||||
conditions: Array<Record<string, unknown>>
|
||||
regionStatuses?: Array<Record<string, unknown>>
|
||||
installedBlueprint?: Record<string, unknown>
|
||||
}
|
||||
|
||||
/** PreviewManifest — one rendered file in the preview output. */
|
||||
export interface PreviewManifest {
|
||||
path: string
|
||||
@ -230,6 +259,28 @@ export async function previewApplication(
|
||||
return res.json()
|
||||
}
|
||||
|
||||
/**
|
||||
* getApplication — fetch full Application detail by name.
|
||||
* Returns null on 404 (not-an-error in the SPA-fallback context).
|
||||
* qa-loop iter-11 Fix #45 Cluster-C.
|
||||
*/
|
||||
export async function getApplication(
|
||||
sovereignId: string,
|
||||
name: string,
|
||||
namespace?: string,
|
||||
): Promise<ApplicationDetailResponse | null> {
|
||||
const params = new URLSearchParams()
|
||||
if (namespace) params.set('namespace', namespace)
|
||||
const qs = params.toString()
|
||||
const url = `${applicationsBase(sovereignId)}/${encodeURIComponent(name)}${qs ? '?' + qs : ''}`
|
||||
const res = await authedFetch(url, { headers: { Accept: 'application/json' } })
|
||||
if (res.status === 404) return null
|
||||
if (!res.ok) {
|
||||
throw new Error(`getApplication: HTTP ${res.status}`)
|
||||
}
|
||||
return res.json()
|
||||
}
|
||||
|
||||
export async function getApplicationStatus(
|
||||
sovereignId: string,
|
||||
name: string,
|
||||
|
||||
@ -33,6 +33,7 @@
|
||||
|
||||
import { useMemo, useState } from 'react'
|
||||
import { useParams, Link } from '@tanstack/react-router'
|
||||
import { useQuery } from '@tanstack/react-query'
|
||||
import { useWizardStore } from '@/entities/deployment/store'
|
||||
import { PortalShell } from './PortalShell'
|
||||
import { JobsTable } from './JobsTable'
|
||||
@ -43,6 +44,7 @@ import { adaptDerivedJobsToFlat } from './jobsAdapter'
|
||||
import { findComponent } from '@/pages/wizard/steps/componentGroups'
|
||||
import { useResolvedDeploymentId } from '@/shared/lib/useResolvedDeploymentId'
|
||||
import type { ApplicationStatus } from './eventReducer'
|
||||
import { getApplication, type ApplicationDetailResponse } from '@/lib/catalog.api'
|
||||
import { ComplianceTab } from './AppDetail/ComplianceTab'
|
||||
import { MembersTab } from './AppDetail/MembersTab'
|
||||
import { TopologyTab } from './AppDetail/TopologyTab'
|
||||
@ -83,9 +85,86 @@ export function AppDetail({ disableStream = false }: AppDetailProps = {}) {
|
||||
})
|
||||
|
||||
const sovereignFQDN = snapshot?.sovereignFQDN ?? snapshot?.result?.sovereignFQDN ?? null
|
||||
const app: ApplicationDescriptor | undefined = findApplication(applications, componentId)
|
||||
const wizardApp: ApplicationDescriptor | undefined = findApplication(applications, componentId)
|
||||
|
||||
// qa-loop iter-11 Fix #45 Cluster-C: when the requested component is
|
||||
// NOT part of the wizard's selectedComponents (the typical case for a
|
||||
// chroot Sovereign — Applications installed via `kubectl apply` or
|
||||
// the catalyst-api install endpoint NEVER pass through the wizard
|
||||
// store), fall back to the catalyst-api's
|
||||
// GET /sovereigns/{id}/applications/{name} endpoint to fetch the
|
||||
// Application CR directly and synthesise an ApplicationDescriptor on
|
||||
// the fly. Prior to this fallback every chroot-installed Application
|
||||
// surfaced the misleading "App not found / The component qa-wp is
|
||||
// not part of this deployment" page even though the Application CR +
|
||||
// HelmRelease were both Ready=True (TC-068 / TC-072 / TC-074 et al.
|
||||
// failed for this exact reason).
|
||||
//
|
||||
// The fallback only runs when (a) we're on a Sovereign route (i.e.
|
||||
// deploymentId resolved from sovereign-self, not a wizard URL) AND
|
||||
// (b) the wizard didn't already supply a descriptor. We use the
|
||||
// sovereign-self deploymentId as the catalyst-api {id} URL segment.
|
||||
const needsApiFallback = !wizardApp && !!deploymentId && !!componentId
|
||||
const apiAppQuery = useQuery({
|
||||
queryKey: ['sov-application', deploymentId, componentId],
|
||||
queryFn: async () => getApplication(deploymentId, componentId),
|
||||
enabled: needsApiFallback,
|
||||
staleTime: 30_000,
|
||||
retry: 1,
|
||||
})
|
||||
const apiApp: ApplicationDetailResponse | null | undefined = apiAppQuery.data
|
||||
|
||||
// Synthesise an ApplicationDescriptor from the API response so the
|
||||
// rest of the page (hero, sections, tabs) can render unchanged. The
|
||||
// descriptor's bareId is derived from the Blueprint name (strip
|
||||
// `bp-` prefix) so reverse-dependency lookups + component-groups
|
||||
// metadata can still resolve.
|
||||
//
|
||||
// Defensive: only synthesise when the API returned a meaningful
|
||||
// Application body (i.e. .name matches the requested componentId or
|
||||
// .blueprint is set). A 404 returns null (handled), but a 200 with
|
||||
// an unrelated body shouldn't be coerced into an Application.
|
||||
const synthesisedApp: ApplicationDescriptor | undefined = useMemo(() => {
|
||||
if (!apiApp) return undefined
|
||||
if (!apiApp.name && !apiApp.blueprint) return undefined
|
||||
const bareId = (apiApp.blueprint ?? '').replace(/^bp-/, '') || componentId
|
||||
const compEntry = findComponent(bareId)
|
||||
return {
|
||||
id: apiApp.blueprint || `bp-${bareId}`,
|
||||
bareId,
|
||||
title: compEntry?.name ?? apiApp.name ?? componentId,
|
||||
description: compEntry?.desc ?? `Application installed in namespace ${apiApp.namespace}`,
|
||||
familyId: compEntry?.product ?? 'platform',
|
||||
familyName: compEntry?.groupName ?? 'Platform',
|
||||
tier: compEntry?.tier ?? 'optional',
|
||||
logoUrl: compEntry?.logoUrl ?? null,
|
||||
dependencies: compEntry?.dependencies ?? [],
|
||||
bootstrapKit: false,
|
||||
}
|
||||
}, [apiApp, componentId])
|
||||
|
||||
const app: ApplicationDescriptor | undefined = wizardApp ?? synthesisedApp
|
||||
const compState = state.apps[componentId]
|
||||
const status: ApplicationStatus = compState?.status ?? 'pending'
|
||||
// Roll up status: prefer wizard-stream signal (live SSE deltas),
|
||||
// fall back to API-fetched phase mapped to the legacy 4-state vocab
|
||||
// (pending | installing | installed | failed | degraded).
|
||||
const apiPhaseStatus: ApplicationStatus | undefined = useMemo(() => {
|
||||
if (!apiApp?.phase) return undefined
|
||||
switch (apiApp.phase) {
|
||||
case 'Ready':
|
||||
return 'installed'
|
||||
case 'Failed':
|
||||
return 'failed'
|
||||
case 'Degraded':
|
||||
return 'degraded'
|
||||
case 'Provisioning':
|
||||
case 'Pending':
|
||||
return 'installing'
|
||||
default:
|
||||
return 'pending'
|
||||
}
|
||||
}, [apiApp])
|
||||
const status: ApplicationStatus = compState?.status ?? apiPhaseStatus ?? 'pending'
|
||||
|
||||
// Bundled dependencies — descriptors of every direct dep, with
|
||||
// human names sourced from componentGroups when available.
|
||||
@ -151,6 +230,36 @@ export function AppDetail({ disableStream = false }: AppDetailProps = {}) {
|
||||
}, [app])
|
||||
|
||||
if (!app) {
|
||||
// While the API fallback is in flight, render a transient
|
||||
// "Loading…" surface instead of the misleading "not found" page —
|
||||
// the not-found page made the matrix-asserted Overview tokens fail
|
||||
// (TC-068 expects "Ready", TC-072 expects "Service" etc.) and the
|
||||
// operator UI flashed an error chip during normal page loads.
|
||||
if (needsApiFallback && (apiAppQuery.isPending || apiAppQuery.isFetching)) {
|
||||
return (
|
||||
<PortalShell
|
||||
deploymentId={deploymentId}
|
||||
sovereignFQDN={sovereignFQDN}
|
||||
pageTitle="Loading…"
|
||||
headerSlotLeft={
|
||||
<Link
|
||||
to={`/dashboard` as never}
|
||||
className="text-[11px] text-[var(--color-text-dim)] hover:text-[var(--color-text)] no-underline"
|
||||
data-testid="sov-back-link"
|
||||
>
|
||||
← Back to apps
|
||||
</Link>
|
||||
}
|
||||
>
|
||||
<style>{APP_DETAIL_CSS}</style>
|
||||
<div className="detail-page">
|
||||
<div className="not-found" data-testid="sov-app-loading">
|
||||
<p>Loading {componentId}…</p>
|
||||
</div>
|
||||
</div>
|
||||
</PortalShell>
|
||||
)
|
||||
}
|
||||
return (
|
||||
<PortalShell
|
||||
deploymentId={deploymentId}
|
||||
|
||||
@ -1,5 +1,39 @@
|
||||
apiVersion: v2
|
||||
name: bp-catalyst-platform
|
||||
# 1.4.117 (qa-loop iter-11 Fix #45 Cluster-B + Cluster-C —
|
||||
# application-controller HR observation + catalyst-api SPA endpoints).
|
||||
#
|
||||
# Cluster-B (application-controller observes downstream HelmRelease):
|
||||
# - Reconciler now polls per-region HelmRelease.status.conditions[Ready]
|
||||
# after every reconcile pass and rolls up the Application's
|
||||
# status.phase: any region Ready=True → phase=Ready, any
|
||||
# Ready=False → phase=Degraded, no HR yet → phase=Provisioning.
|
||||
# - Periodic 30s re-list ticker (Run goroutine) ensures HR readiness
|
||||
# flips reach Application.status.phase even though the Application
|
||||
# Watch doesn't fire on sibling HR changes.
|
||||
# - Application-controller ClusterRole gains
|
||||
# helm.toolkit.fluxcd.io/helmreleases get/list/watch.
|
||||
# - status.lastReconciledAt populated on every pass for TC-113.
|
||||
# - Without this fix Application sat at Provisioning indefinitely
|
||||
# even after `kubectl get hr -n qa-omantel qa-wp` was Ready=True
|
||||
# for hours; matrix TC-066 / TC-100 / TC-104 / TC-113 stayed FAIL.
|
||||
#
|
||||
# Cluster-C (catalyst-api SPA endpoints + namespace alias):
|
||||
# - GET /sovereigns/{id}/applications/{name} returns full Application
|
||||
# detail (identity + spec + status) so the SPA AppDetail page can
|
||||
# synthesise an ApplicationDescriptor for chroot-installed
|
||||
# Applications that aren't part of the wizard's selectedComponents.
|
||||
# Unblocks TC-068 / TC-072 / TC-074 et al ("App not found" misfire).
|
||||
# - GET /sovereigns/{id}/k8s/{kind} accepts both ?ns= and ?namespace=
|
||||
# query params (was: only ?ns=, silently ignored ?namespace=). The
|
||||
# SPA + kubectl-canonical clients all emit ?namespace=; without the
|
||||
# alias TC-262 / TC-263 returned every namespace's services.
|
||||
# - SPA AppDetail.tsx falls back to GET /applications/{name} when the
|
||||
# wizard store has no descriptor for the requested componentId
|
||||
# (the typical chroot Sovereign case).
|
||||
#
|
||||
# Image bumps follow this chart bump in the same PR.
|
||||
#
|
||||
# 1.4.116 (qa-loop iter-10 Fix #44 follow-up — chart re-publish).
|
||||
# Chart 1.4.115 was published from the merge commit which still had
|
||||
# the OLD application-controller image tag (a3ba200) baked into
|
||||
@ -471,7 +505,7 @@ name: bp-catalyst-platform
|
||||
# so the matrix's `kubectl get cnpgpair` stdout contains the literal
|
||||
# "cnpgpair" substring TC-306 asserts on (envsubst override beat the
|
||||
# chart values default fixed in PR #1247).
|
||||
version: 1.4.116
|
||||
version: 1.4.117
|
||||
appVersion: 1.4.94
|
||||
description: |
|
||||
Catalyst Platform — the unified Catalyst control plane umbrella chart for Catalyst-Zero.
|
||||
|
||||
@ -62,4 +62,15 @@ rules:
|
||||
- apiGroups: ["kustomize.toolkit.fluxcd.io"]
|
||||
resources: ["kustomizations"]
|
||||
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
|
||||
# qa-loop iter-11 Fix #45 Cluster-B: the controller observes the
|
||||
# downstream HelmRelease's status.conditions[Ready] to roll up the
|
||||
# Application.status.phase. Read-only — controller never writes
|
||||
# HelmRelease objects (Flux owns the writes; the controller only
|
||||
# commits the YAML to Gitea). Without this grant the read fails
|
||||
# with `helmreleases.helm.toolkit.fluxcd.io is forbidden` and the
|
||||
# phase stays at Provisioning forever (the live live failure on
|
||||
# omantel iter-11 — TC-066 / TC-100 / TC-104 / TC-113 stayed FAIL).
|
||||
- apiGroups: ["helm.toolkit.fluxcd.io"]
|
||||
resources: ["helmreleases"]
|
||||
verbs: ["get", "list", "watch"]
|
||||
{{- end }}
|
||||
|
||||
Loading…
Reference in New Issue
Block a user