* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56
PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:
cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
cmd/api/main.go:693:42: h.HandleSovereignTopology undefined
That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.
Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.
Also revert two more parallel-baby fragments still on main:
- getHierarchicalInfrastructure mode-aware fetcher → single mother
URL (the chroot resolves deploymentId from the cookie and the
mother-side topology handler serves byte-identical data once
cutover-import has persisted the deployment record on the
Sovereign's local store)
- CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere
Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign
The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.
Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.
Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.
Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(user-access): empty list when CRD absent + RBAC for chroot
Two coupled fixes for the /users page on chroot Sovereign Console:
1. catalyst-api-cutover-driver ClusterRole: grant read/write on
useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
uses the in-cluster ServiceAccount (per PR #1052). The list call
was returning 403 from the apiserver because the SA had no rule
covering this CRD.
2. ListUserAccess: return 200 with empty items when the CRD itself
is not installed (apierrors.IsNotFound). The access.openova.io
CRD ships via a separate blueprint that may not yet be installed
on a fresh Sovereign — the page should render its empty state,
not a 500 toast.
Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint
Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.
1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
the topology query errored — because it fell back to
`infrastructureTopologyFixture` from `src/test/fixtures/`. That is
a test-only file leaking into production via the production import
tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
placeholder data — empty state when you don't know).
Fix: drop the fixture fallback. On error → null → empty-state
render. The mother shows the same empty state when its loader
returns nothing; byte-identical.
2. JobsTable + JobDetail rendered a flat green-grid because the chroot
was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
(no dependsOn, no parentId, no exec records). Mother's
`/api/v1/deployments/{depId}/jobs` returns the rich shape from a
per-deployment jobs.Store, which on the chroot starts empty (the
mother's exportDeploymentToChild only ships the deployment record,
not the jobs.Store contents).
Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
deployment jobs.Store has 0 records: do a one-shot HelmRelease
list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
— exported here, mirrors Watcher.SnapshotComponents without
spinning up an informer), pass through snapshotsToSeeds +
Bridge.SeedJobsFromInformerList. Subsequent calls read directly
from the now-populated store and return rich Job records with
dependsOn / parentId / status — exactly like the mother.
useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.
3. HandleDeploymentImport now also loads the imported record into the
in-memory deployments map immediately, so `/deployments/{id}/*`
handlers don't need a pod restart's restoreFromStore to see the
chroot-imported deployment.
Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s
JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.
Two coupled fixes:
1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
producing /jobs/install-keycloak (Traefik-safe) instead of
/jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
accepts both bare jobName and canonical id (see store.go:781-789).
2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
the URL param resolves regardless of which format the link emitted.
Bump chart 1.4.58 → 1.4.59.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined
CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.
Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.
Bump chart 1.4.60 → 1.4.61.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroot): single chrome — no frame in frame, no mother handover banner
Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:
1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
SovereignConsoleLayout rendered its own sidebar+header AND the page
inside rendered PortalShell which rendered ANOTHER header (its
sidebar was already skipped for chroot per a prior fix). User saw
two horizontal title bars stacked.
Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
It runs the cookie/OIDC auth gate + RequiredActionsModal, then
renders <Outlet/> with NO chrome. PortalShell is now the single
chrome owner on both surfaces:
- Mother (/sovereign/provision/$id): renders Sidebar with
/provision/$id/X URLs + its header.
- Chroot (console.<sov-fqdn>): renders SovereignSidebar
with clean /X URLs + the same header.
One sidebar, one header, byte-identical to mother layout.
2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
banner on /apps.** This is the mother's wizard celebration that
tells the operator "you can now jump to your new Sovereign". On
the chroot the operator IS already on the Sovereign Console; the
banner bleeds through because the imported deployment record
carries the mother's handover-ready event in its history.
Resolution: AppsPage gates the banner, the toast, and the
auto-redirect timer on `!isSovereignMode`. Chroot stays clean.
Bump chart 1.4.62 → 1.4.63.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page
Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.
/settings/marketplace → MarketplaceSettings (wrapped in PortalShell)
/parent-domains → ParentDomainsPage (wrapped in PortalShell)
/catalog → CatalogAdminPage (deleted)
Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
- /catalog route registration in router.tsx
- 'Catalog' entry in SovereignSidebar's FLAT_NAV
- CatalogAdminPage.tsx (525 lines)
- 'catalog' from ActiveSection union + deriveActiveSection regex
The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.
Bump chart 1.4.64 → 1.4.65.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(apps): publish chip on each card — replaces deleted /catalog page
Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.
Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
in-cluster SME catalog microservice at
http://catalog.sme.svc.cluster.local:8082. 30s response cache,
1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
not deployed on this Sovereign — common when marketplace.enabled
is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
*bool joined by slug from the SME catalog. nil ⇒ slug not in SME
catalog (bootstrap component, or marketplace not deployed) ⇒ FE
suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
/api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
catalog. Surfaces upstream status verbatim. Invalidates the cache
so the next /apps poll reflects the change immediately.
Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
catalog unreachable → nil for every slug).
Bump chart 1.4.66 → 1.4.67.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code
Audit triggered by founder asking if PRs #1051..#1059 reach NEW
Sovereigns or just my manual `kubectl set image` patches on omantel.
Answer was: nothing reached anyone except omantel via manual patches.
Both contabo AND every fresh Sovereign would install :2122fb8 — the
SHA frozen at PR #1040's last manual chart-touch on May 6 morning.
Root cause:
- chart/templates/api-deployment.yaml + ui-deployment.yaml carry
LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"),
not Helm-templated `{{ .Values.images.catalystApi.tag }}`.
- catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag
on every push — but no template reads from it. Dead code.
- contabo's catalyst-platform Flux Kustomization at
./products/catalyst/chart/templates applies these as raw manifests.
- Sovereigns Helm-install the same chart; Helm passes the literal
through unchanged.
- Both ended up frozen at whatever literal was committed at the last
manual chart-touching PR.
Fix:
1. CI's deploy step now bumps both the literal SHAs in the two
template files AND the unused-but-kept-for-SME-services
values.yaml. Sed-patches the literal directly so contabo's Kustomize
path keeps working.
2. The commit step adds the two templates to the staged set alongside
values.yaml, so every "deploy: update catalyst images to <sha>"
commit propagates to contabo (10-min reconcile) AND Sovereigns
(next OCI chart publish via blueprint-release).
3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with
the latest literal (currently :8361df4) gets republished and
pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml.
Why drop the "freeze contabo" intent of the previous comment:
The previous comment said contabo auto-roll on every PR was bad
because PR #975's image broke contabo (k8scache startup loop).
Solution there is: fix the bug in the code, not freeze contabo.
Freezing masked real divergence — the reason the founder caught
this is that manual omantel patches were the only thing keeping
omantel current while contabo + every other fresh Sovereign quietly
ran 9 PRs behind.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane
Founder asked: "make the real-time k8s information propagation
development reused — find the reverted prior work and implement the
final working one."
History:
- PR #358 (May 1) shipped the full informer + SSE data plane:
internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics}
+ handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) +
UI hook lib/useK8sStream.ts + widget useK8sCacheStream.
- PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream
with kinds=namespace,node,pv,pod,deployment,...,server.hcloud,
volume.hcloud and `&initialState=1` for live cloud-graph deltas.
- PR #981 hotfix dropped the synchronous discovery probe in
factory.go:AddCluster (it was calling
core.Discovery().ServerResourcesForGroupVersion(gv) with NO context
timeout — on a kubeconfig pointing at a decommissioned otech the
call hung the catalyst-api startup for minutes per dead cluster).
After #981 the discovery-probe surgery was clean — no follow-up
broke. The data plane code stayed in the codebase. The remaining
gap was operational, not architectural:
On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>),
the catalyst-api boots without a posted-back kubeconfig in
/var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns []
→ factory has zero clusters → every
/api/v1/sovereigns/{depId}/k8s/* request 404s with
"sovereign \"...\" not registered". The architecture-graph
in-flight call confirmed live on omantel.biz today.
Fix in this PR:
1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN
env is set (chroot mode), build a ClusterRef with id resolved from
CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by
scanning /var/lib/catalyst/deployments/*.json for a record matching
the FQDN (mirrors HandleSovereignSelf's store-fallback path for
consistency). DynamicClient + CoreClient built from
rest.InClusterConfig(). Append to the cluster list. Mother behavior
unchanged — SOVEREIGN_FQDN unset → branch is a no-op.
2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide
get/list/watch on every kind in the k8scache registry (pods,
deployments, statefulsets, daemonsets, replicasets, services,
endpointslices, ingresses, configmaps, secrets, persistentvolumes,
persistentvolumeclaims, hcloud.crossplane.io managed resources,
vclusters), plus authorization.k8s.io/subjectaccessreviews so the
per-event SAR gating in the SSE handler doesn't 403 silently.
3. Bump chart 1.4.70 → 1.4.71.
The discovery-probe failure mode that triggered the original revert
(synchronous ServerResourcesForGroupVersion blocking startup) does
NOT recur here — InClusterConfig() returns immediately, NewForConfig
is lazy, and the first network call happens inside the informer
goroutine after Start, off the boot critical path. Mother-side
LoadClustersFromDir behavior is untouched (no probe, just kubeconfig
file parsing as it has been since #981).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(cloud): + More popover escapes overflow clip + graph centers via gravity force
Two cloud-page bugs caught live on omantel.biz:
(1) /cloud?view=list&kind=clusters → +More popover non-functional.
The popover renders at its anchor coords but pointer events pass
through to the toolbar below it. Diagnosis:
.cloud-page-toolbar > [data-testid="cloud-kind-chips"] {
overflow-x: auto;
}
Per CSS spec, when one overflow axis is non-visible, the OTHER
axis becomes auto/hidden too. So overflow-x:auto on the chips
strip silently sets overflow-y:auto, which clips the absolutely-
positioned popover that hangs DOWN from the +More button.
Fix: render the popover via React.createPortal to document.body
so it's outside any overflow ancestor. Position via fixed
coordinates computed from the +More button's
getBoundingClientRect, recomputed on resize/scroll. Click-outside
dismissal updated to check both wrapper AND portaled popover.
(2) /cloud?view=graph → bubbles drift to canvas edges, leaving the
centre empty until enough nodes (e.g. worker nodes) are added
to anchor things via link tension.
Two coupled root causes:
a) `forceCenter` only adjusts the centroid — it shifts ALL
nodes uniformly so their average sits at (cx, cy). It does
NOT pull individual nodes inward. With small node counts
and high charge repulsion (-160 for ≤50 nodes), nothing
opposes outward drift.
b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x =
minX`. Nodes that hit the wall get arrested with their
velocity preserved on the perpendicular axis but no inward
impulse → they slide along the wall and stack at corners.
The simulation never relaxes back to the centre.
Fix:
a) Add forceX(cx) + forceY(cy) with `centerGravity` strength
per node-count tier (0.08 for ≤50, scaling down with
larger graphs where link tension is sufficient). This pulls
every individual node toward the centre proportional to its
offset.
b) Replace the hard clamp with an elastic bounce: when a node
hits the boundary, reverse its velocity component (×0.4
damping) instead of zeroing it. Energy returns to the
system, the simulation actually relaxes.
Bump chart 1.4.72 → 1.4.73.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>