4c83d98765
16 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
22851d980d | deploy: bump bp-newapi upstream v0.13.2 chart 1.4.8 | ||
|
|
4abd156fee
|
feat(newapi): real /admin/tokens/sandbox mint impl (was stub from #1619) (#1638)
Replaces the Wave 1b stub that echoed the inbound PAT verbatim with a
real HS256 mint flow the sandbox-controller can call when it rolls out
a fresh Sandbox Pod.
Handler (platform/newapi/internal/handler/sandbox_token.go):
- Caller auth: shared admin-secret bearer (env NEWAPI_ADMIN_SECRET),
constant-time compared. 401 on mismatch / missing bearer.
- Request body: {org_id, user_id, sandbox_id, allowed_channels[]}.
De-duplicates + scrubs empty channel names so a controller bug
sending [""] can't mint a token that NewAPI silently treats as
"no restriction".
- Mints HS256 JWT signed with NEWAPI_TOKEN_SIGNING_KEY. Claim shape:
{sub: sandbox_id, org: org_id, user: user_id, channels: [...],
iat, exp: iat+7d, typ: "sandbox"}.
- Returns {token, expires_at}.
- Refuses with 503 when SigningKey or AdminSecret is unset
(visible chart-wiring gap, not a forgeable-token leak).
- Removes the previous Claims/jwt.Parse PAT-validation path that
came with the stub — caller is the controller, not an operator.
- NewHandlerFromEnv() factory loads + validates env at process
start so catalyst-api can fail loudly instead of shipping the
endpoint silently.
Unit tests (sandbox_token_test.go) — 11 cases:
- happy path (mint + claim shape + signature round-trip)
- de-dup + empty-channel scrub
- admin-secret mismatch / missing bearer → 401
- missing org_id / user_id / sandbox_id / empty channels → 400
- non-POST → 405
- unset env → 503
- mintSandboxToken empty-secret guard + round-trip
- response does not echo admin secret or signing key
Chart wiring (platform/newapi/chart):
- New Secret template sandbox-token-signing-key-secret.yaml
auto-renders with Helm `lookup` + helm.sh/resource-policy: keep
(same load-bearing pattern as credentials-secret.yaml #943 and
gitea admin-secret.yaml #830 Bug 2). 64-char alphanumeric values
for both SIGNING_KEY and ADMIN_SECRET; persistence across
reconciles is required because a reconcile-time rotation would
silently invalidate every per-Sandbox token across the Sovereign
AND break the sandbox-controller's auth path until its Pod
restarts.
- values.yaml block sandboxTokenSigningKey.{existingSecret,
autoProvision, autoSecretName} matching the `credentials`
convention (operator override > auto-provision > skip-render).
- No Chart.yaml bump — chart value addition only.
Verification:
- go build ./platform/newapi/internal/handler/... — clean
- go test ./platform/newapi/internal/handler/... — 11/11 PASS
- helm template platform/newapi/chart — Secret renders
How sandbox-controller will use it:
1. Read NEWAPI_ADMIN_SECRET from mounted Secret newapi-token-signing-key.
2. POST /admin/tokens/sandbox with bearer + body
{org_id: <Sandbox.spec.owner.orgRef.slug>,
user_id: <Sandbox.spec.owner.email>,
sandbox_id: <Sandbox.metadata.uid>,
allowed_channels: ["qwen3.6-bankdhofar"]}.
3. Write returned token into Secret/sandbox-<uid>-newapi-token.
4. Mount that Secret into the Sandbox Pod as LLM_GATEWAY_TOKEN.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
255eb3bf17
|
feat(sandbox+auth+newapi): Wave 1b — newapi proxy + BYOS + org-scoped JWT (#1619)
Three coordinated deliverables for Sandbox Wave 1b — scaffolding +
design + the ONE prerequisite (long-lived org-scoped JWT) the rest of
Sandbox depends on.
Deliverable 1 — newapi proxy contract:
- products/sandbox/docs/newapi-proxy-contract.md: agent-pod env
(LLM_GATEWAY_URL / OPENAI_BASE_URL alias), provider selection
(?provider=qwen; default Qwen via omtd.bankdhofar.com), per-Sandbox
token issuance via /admin/tokens/sandbox bridge, lifecycle +
rotation, auth model.
- platform/newapi/internal/handler/sandbox_token.go: bridge handler
stub. Validates the inbound PAT (typ=pat + aud=newapi + org_id
cross-check vs request body), then echoes a NewAPI-shaped response
so the contract is testable without the upstream NewAPI admin
API. Wave 4 wires the actual upstream calls.
Deliverable 2 — Claude Code BYOS OAuth:
- products/sandbox/docs/claude-code-byos.md: UX (Connect Claude Max →
OAuth → refresh token Secret/catalyst-system/sandbox-byos-claude-
code-<user-uid>), Pod env injection (ANTHROPIC_API_KEY bypassing
newapi), per-session toggle, revocation paths, chart wiring.
- products/catalyst/bootstrap/api/internal/handler/byos_claude_code.go:
POST /start, GET /callback, DELETE, GET /status — four endpoints
behind RequireSession. Honest 503 + 501 surface so the popup
flow exercises end-to-end against the placeholder client_id;
Wave 4 flips it live.
Deliverable 3 — Long-lived org-scoped JWT (THE prerequisite):
- platform/keycloak/chart/templates/configmap-sovereign-realm.yaml +
configmap-tenant-realm.yaml: add `org` protocolMapper emitting
user attribute `org` as claim `org_id`; add `org` to default
client scopes for ALL clients.
- core/services/auth/handlers/handlers.go: include typ=session in
JWTs + document the cross-service claim contract.
- core/services/auth/handlers/pat.go: NEW POST /auth/pat with
admin-configurable TTL (default 7d, max 90d), audience claim,
capabilities pass-through, typ=pat discriminator.
- core/services/auth/handlers/routes.go + main.go: wire /auth/pat
behind JWTAuth middleware.
- core/services/shared/auth/claims.go: single Claims struct +
HasCapability/HasGroup helpers + ContextKey for cross-service
consumers (sandbox-controller, newapi bridge, MCP server).
- products/catalyst/bootstrap/api/internal/auth/session.go: align
Org JSON tag with new `org_id` claim; UnmarshalJSON accepts BOTH
legacy `org` and new `org_id` so a rolling chart upgrade does
not regress org-scoped queries.
Out of scope (Wave 4 wires):
- Sandbox CRD + controller (writes Secret, mounts Pod env).
- Actual outbound HTTP to Anthropic /oauth/token + KMS encrypt.
- Actual outbound HTTP to NewAPI admin API.
- Per-Sandbox capability projection from Keycloak groups.
- PAT revocation lookup (jti store) + /auth/pats list.
- Settings UI card + session-toolbar routing toggle.
Build verification (go vet + go build clean):
- core/services/auth/...
- core/services/shared/...
- platform/newapi/internal/handler/...
- products/catalyst/bootstrap/api/...
Founder TODO (single knob to flip BYOS live, Wave 4):
Register an Anthropic OAuth client at
https://console.anthropic.com/settings/oauth (public PKCE,
redirect=https://console.<sov-fqdn>/api/v1/sandbox/byos/claude-code/callback)
and paste the client_id into clusters/<sovereign>/bootstrap-kit/
sandbox.yaml. Today every BYOS endpoint returns 503 with a clear
message pointing at claude-code-byos.md §8.
Refs: products/sandbox/docs/architecture.md §6 (THE prerequisite).
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
|
||
|
|
0520760543 | deploy: bump bp-newapi upstream v0.13.2 chart 1.4.7 | ||
|
|
74d23ab3dc
|
fix(charts): explicit harbor.openova.io/proxy-dockerhub prefix on all chart-hook images (#163) (#1367)
Per CLAUDE.md MIRROR-EVERYTHING inviolable rule: every chart-hook image reference (pre/post-install Jobs, helper Pods) must use the explicit Harbor proxy-cache form. Fix #158's bitnami → bitnamilegacy swap was a band-aid; the architecturally correct fix is to defeat upstream-deletion blast radius entirely by routing through Harbor. The node-level containerd mirror in infra/hetzner/cloudinit-control- plane.tftpl (line 706) already redirects docker.io/* → harbor.openova.io/proxy-dockerhub/* implicitly, but implicit routing: - Hides the routing from SBOM scans - Bypasses the Kyverno harbor-proxy-pull ClusterPolicy - Means a chart audit (`grep docker.io`) misses a real dependency - Was the proximate cause of prov #27 wedging when Bitnami deleted docker.io/bitnami/kubectl:1.30.4 (Fix #158 had to chase the deletion mid-flight instead of being insulated by Harbor cache) 19 chart-hook image: refs + 5 chart values.yaml repository: defaults now carry the explicit harbor.openova.io/proxy-dockerhub prefix. Application/subchart images (keycloak, postgresql, mongodb in keycloak+litmus subcharts) are intentionally out of scope for this PR — those go through the node-level containerd mirror still. Affected blueprints + chart version bumps: bp-cert-manager 1.2.1 -> 1.2.2 bp-external-secrets-stores 1.0.4 -> 1.0.5 bp-crossplane-claims 1.1.4 -> 1.1.5 bp-flux 1.2.1 -> 1.2.2 bp-guacamole 0.1.16 -> 0.1.17 bp-self-sovereign-cutover 0.1.28 -> 0.1.29 bp-k8s-ws-proxy 0.1.9 -> 0.1.10 bp-harbor 1.2.15 -> 1.2.16 bp-gitea 1.2.5 -> 1.2.6 bp-newapi 1.4.5 -> 1.4.6 bp-wordpress-tenant 0.2.0 -> 0.2.1 catalyst-platform 1.4.138 -> 1.4.139 Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
db26306f92 | deploy: bump bp-newapi upstream v0.13.2 chart 1.4.5 | ||
|
|
54e65aa4b1
|
fix(bp-newapi): pre-install gate on external-secrets webhook readiness (Fix #138) (#1347)
prov #20: bp-newapi 1.4.2 HR FAILED with the chart's templates/external-secret.yaml apply rejected by the apiserver: Internal error occurred: failed calling webhook "validate.externalsecret.external-secrets.io": ... no endpoints available for service "external-secrets-webhook" bp-external-secrets reaches HR Ready=True the moment its Deployments report Ready, but Pod Ready != webhook EndpointSlice reachable: the apiserver-side EndpointSlice for the webhook Service has not been observed by the validating admission controller's lookup yet. Flux dependsOn satisfies the dependency graph but does NOT close this race. Same root-cause class as Fix #137 (bp-external-secrets-stores) but a DIFFERENT chart and DIFFERENT validation endpoint (ExternalSecret vs ClusterSecretStore). Canonical seam (Inviolable Principle #16): the chart that CONSUMES the webhook owns the readiness gate. NOT the upstream external-secrets chart (Fix #137 territory) and NOT a Flux HR-level dependsOn (which checks the wrong layer). Adds platform/newapi/chart/templates/000-external-secrets-webhook- readiness-job.yaml — a pre-install/pre-upgrade Helm hook that polls the webhook (default external-secrets-webhook.external-secrets-system.svc:443/validate- external-secrets-io-v1beta1-externalsecret) until it returns a structured HTTP response (200/400/405/415/422). 60s wall budget, 2s interval, no RBAC required (curl-only Pod, HTTPS to ClusterIP). Templated end-to-end via .Values.externalSecretsWebhookGate.* per Inviolable Principle #4 — operator may override service, namespace, port, path, timeout, interval, or disable the gate entirely from a per-Sovereign overlay. Capability-gated on the external-secrets.io/v1beta1 CRD AND on the existing catalystIntegration.externalSecret.enabled chain, so a Sovereign that disables catalyst-integration pays no probe overhead. Chart 1.4.2 -> 1.4.4 (1.4.3 was a deploy-only image-tag bump). HR template clusters/_template/bootstrap-kit/80-newapi.yaml repinned. ## Claimed TCs Infra-only fix; no UI behaviour change. Unblocks bp-newapi reaching HR Ready=True on every fresh provision, which is a hard prerequisite for: - ADR-0003 §3.2 Catalyst signup hook (alice -> per-user NewAPI key) - alice signup gate 5 (LLM) end-to-end - Any TC that exercises /v1/* customer API or admin.<sovereign-fqdn> Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a9861f9491 | deploy: bump bp-newapi upstream v0.13.2 chart 1.4.3 | ||
|
|
f668d791ab
|
fix(bp-newapi): publish newapi-mirror image + repoint chart to existing tag (qa-loop bounded-cycle audit prov #7 Gap F) (#1315)
Root cause from live diagnosis (omantel.biz prov #7, kubectl --context=omantel): The bp-newapi chart at platform/newapi/chart/values.yaml referenced `ghcr.io/openova-io/openova/newapi-mirror:v0.4.5` since its first commit (44d0200a, 2026-05-01). However: 1. NO CI workflow ever built that image. There is no `build-bp-newapi*.yaml` (or similar) under .github/workflows/. The GHCR package `ghcr.io/openova-io/openova/newapi-mirror` does not exist (404 from /orgs/openova-io/packages/container/...). 2. The tag `v0.4.5` is fictitious — neither upstream Calcium-Ion/new-api (`docker.io/calciumion/new-api`) nor the alternate ancestor (`justsong/one-api`) ever published a `v0.4.5`. The lowest stable Calcium-Ion tag is `v0.6.0.9`; the highest stable v0.x is `v0.13.2` (upstream publish 2026-04-27). Result: every fresh Sovereign's NewAPI Pod ImagePullBackOff'd 403 Forbidden on the never-existed image, blocking alice signup gate 5 (LLM) and surfacing in the bounded-cycle audit as Gap F. Fix (mirrors bp-guacamole CI pattern in .github/workflows/build-bp-guacamole.yaml): - NEW .github/workflows/build-bp-newapi.yaml — push to platform/newapi/chart/** triggers a Job that pulls `docker.io/calciumion/new-api:<UPSTREAM_VER>`, captures the upstream repo digest, re-tags as `ghcr.io/openova-io/openova/newapi-mirror: <UPSTREAM_VER>` + `:latest`, pushes both, then bumps values.yaml + Chart.yaml + dispatches blueprint-release. - platform/newapi/chart/values.yaml — newapi.image.tag bumped from `v0.4.5` (fictitious) to `v0.13.2` (latest stable Calcium-Ion/new-api on Docker Hub). Comment block expanded with full rationale + link to the new build workflow + bump-in-lockstep instructions. - platform/newapi/chart/Chart.yaml — version 1.4.1 → 1.4.2, appVersion `0.4.5` → `0.13.2` (Helm convention: appVersion = upstream version without the `v` prefix). Inline changelog records the audit-prov-7 Gap F lineage. - clusters/_template/bootstrap-kit/80-newapi.yaml — pinned chart version 1.4.1 → 1.4.2 with the same changelog inline. Verified locally: - `helm template smoke platform/newapi/chart --set database.existingSecret=fake --set credentials.existingSecret=fake --set auth.adminUI.mode=masterKey` renders `image: "ghcr.io/openova-io/openova/newapi-mirror:v0.13.2"` and `app.kubernetes.io/version: "0.13.2"`. The v1.0.0-rc.x upstream line is gated on schema migration stabilisation; the channel-seed Job uses the legacy admin-API request shape, so do NOT auto-roll past v0.13.x without re-running the channel-seed integration smoke against NewAPI's `/api/channel/`. Pairs with the Gap C re-investigation memo (no chart fix needed; PR #1309 only gated `defaultCompositionRef`, not the XRD itself; the useraccesses.access.openova.io CRD is present on omantel prov #7). DO NOT MERGE — this PR is for qa-loop bounded-cycle Wave 5 Fix #80 (Gap F) review. Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2ff50f0591
|
fix(bp-newapi+services-build): imagePullSecrets on Pod, sed bumps values.yaml smeTag (#955)
Two SME-blocker bugs caught live on otech113 (alice signup gate 5 fails on fresh Sovereign): #952 — bp-newapi 1.4.0 Pod has no imagePullSecrets, so kubelet pulls PRIVATE ghcr.io/openova-io/openova/{newapi-mirror,services-metering-sidecar} anonymously and gets 403 Forbidden. Fix: - Templatize spec.imagePullSecrets on Deployment + channel-seed Job. - Default values.yaml `imagePullSecrets: [{name: ghcr-pull}]`. - Add `newapi` to flux-system/ghcr-pull's reflector reflection-{allowed,auto}-namespaces in cloudinit-control-plane.tftpl so bp-reflector mirrors the source Secret into the namespace automatically on every fresh Sovereign. - Bump bp-newapi 1.4.0 -> 1.4.1, update _template overlay. #953 — services-build.yaml's image-rewrite loop only matched the hardcoded `image: ghcr.io/.../services-<svc>:<sha>` form. 7 of 8 sme-services templates use `image: "{{ ... }}/services-<svc>:{{ .Values.images.smeTag }}"`. Each services-build run bumped only auth.yaml while reporting "update sme service images to ${SHA}", leaving the live Pod on stale bytes (PR #951's #941 fix never reached services-catalog despite the merge + chart bump chain). Fix: - After the hardcoded loop, also bump `images.smeTag` in products/catalyst/chart/values.yaml with a strict regex match (`^ smeTag: "<sha>"$`); refuse to auto-bump if the line shape changes (defends against silent drift if a contributor renames the field). - Mirror the change into the retry-path `rewrite()` function so a reset-to-origin/main retry does not recreate the original bug. Tests: - platform/newapi/chart/tests/imagepullsecrets-render.sh — 4 cases asserting the Deployment and channel-seed Job carry the default ghcr-pull reference, that an empty override suppresses the block, and that custom secret names propagate (Inviolable Principle #4). - tests/integration/services-build-rewrite.sh — 3 cases reproducing the workflow's rewrite logic on a sandboxed copy of the live chart, asserting both auth.yaml's hardcoded line AND values.yaml's smeTag get bumped, that helm-render of the catalyst chart with the bumped values produces all 8 SME-service Deployments at the new SHA, and that an idempotent re-bump to a second SHA also lands cleanly. Refs: #952 #953 (umbrella #915 — alice signup gate 5). Co-authored-by: hatiyildiz <143030955+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
689276889c
|
fix(bp-catalyst-platform+bp-newapi): unblock alice signup gates 2-6 on Sovereigns (#915) (#951)
Six coupled chart + orchestrator fixes that unblock alice marketplace
signup → tenant ready → SaaS integrations → LLM → ledger on a freshly
franchised Sovereign. C5-final got Gate 1 GREEN on otech113 (2026-05-05)
but every downstream gate failed because the SME bundle hardcoded
contabo-only assumptions.
Bumps:
- bp-catalyst-platform 1.4.21 → 1.4.22
- bp-newapi 1.3.0 → 1.4.0
- bootstrap-kit slot 13 + 80 pins updated in lockstep
Issues addressed (single consolidated PR — smaller PRs would race
against alice signup retries):
- #934 (auth SMTP empty → "failed to send email"): sme-secrets.yaml
now reads SMTP_* from `catalyst-system/sovereign-smtp-credentials`
(the same A5-seeded source #883/#905 the chart 1.4.20 catalyst-
openova-kc-credentials Secret already uses) with source-wins
precedence. Both canonical (smtp-host/port/from/user/pass) AND
legacy (host/port/from/user/password) source-Secret key shapes
accepted. Empty source falls back to chart-level defaults so the
contabo path stays clean.
- #940 (provisioning service GITHUB_TOKEN placeholder + hardcoded
upstream github.com): chart values
.Values.smeServices.provisioning.{githubToken,git.{apiURL,owner,
repo,branch}} make every GitHub-API coordinate operator-overridable
with topology-aware defaults (Sovereign ⇒ in-cluster Gitea REST
API + `openova` org; contabo ⇒ api.github.com + `openova-io` org).
Provisioning binary's startup gate validates the GITHUB_TOKEN does
NOT contain placeholder substrings (<placeholder>, PLACEHOLDER,
REPLACE_ME, ...) and crashes the Pod into Pending if it does — the
operator sees the misconfig immediately instead of after alice
signups have failed silently in service logs. GitHub client now
accepts a custom API URL via NewClientWithAPIURL so Gitea's GitHub-
compatible /api/v1 surface drops in without re-implementing the
client.
- #941 (catalog "27 apps COMING SOON"): added `openclaw` and
`stalwart-mail` to migrateAppDeployable's deployable map at
core/services/catalog/handlers/seed.go. Both blueprints (bp-openclaw,
bp-stalwart-{sovereign,tenant}) ship with visibility=listed in the
embedded blueprints.json AND have working SME-tenant overlay
templates in sme_tenant_gitops.go, but the catalog handler silently
filtered them out because they were missing here. Map extracted to
DeployableAppSlugs() exported function so unit tests can assert
membership without invoking a Mongo store.
- #942 (REDPANDA_BROKERS hardcoded to talentmesh): configmap.yaml
selects broker default at render time based on global.sovereignFQDN
— Sovereign ⇒ NATS JetStream Service per ADR-0001 (the only local
bus on Sovereigns); contabo ⇒ legacy Redpanda Service in talentmesh.
Operator MAY override either default via
.Values.smeServices.eventBus.brokers without forking the chart.
The ConfigMap key name stays REDPANDA_BROKERS for back-compat with
existing SME service Go env wiring; new EVENT_BUS_PROTOCOL key
surfaces the protocol hint for services that want to switch wire
format independently.
- #943 (bp-newapi silently skips Deployment): NEW
templates/cnpg-cluster.yaml auto-provisions a CNPG-backed Postgres
Cluster + Helm-`lookup`-persistent DSN Secret when
.Values.cnpg.enabled (DEFAULT true). NEW templates/credentials-
secret.yaml auto-generates SESSION_SECRET + CRYPTO_SECRET (each
64-char randAlphaNum, persistent across reconciles via Helm
`lookup`) when .Values.credentials.autoProvision (DEFAULT true).
deployment.yaml gate now resolves Secret names from the chart-
emitted defaults when the operator hasn't supplied an override.
Capabilities-gated on postgresql.cnpg.io/v1 so a cold install
before bp-cnpg is Ready surfaces as "no Cluster yet" rather than
a hard install error.
- #944 (CRITICAL — cross-cluster pollution): provisioning.yaml
templates GIT_BASE_PATH from
.Values.smeServices.provisioning.gitBasePath with a topology-aware
default `clusters/<sovereignFQDN>/sme-tenants` on Sovereigns. NEW
`core/services/provisioning/gitguard` package validates at startup
AND on every commit code path that the path begins with
`clusters/<self-FQDN>/` — refusing to commit to any other cluster's
tree. Defence in depth so a runtime env mutation (kubectl exec,
ConfigMap update without Pod restart, hostile sidecar) cannot
bypass the check. Pre-#944 every alice tenant overlay landed in
upstream openova/openova `clusters/contabo-mkt/tenants/<id>/`
which contabo Flux would then install on the contabo cluster —
C5-final caught + reverted the alice2 incident at commit
|
||
|
|
9447d88dfd
|
feat(bp-newapi): auto-seed channel #1 = Qwen3.6 @ BankDhofar (#915) (#919)
Per epic #915 (SME tenant integration DoD: alice → OpenClaw → NewAPI → Qwen3.6@BankDhofar end-to-end), bp-newapi must come up with channel #1 = Qwen3.6 hosted at BankDhofar (https://llm-api.omtd.bankdhofar.com, model qwen3-coder / alias qwen3.6) already wired to its admin API, so the FIRST customer request from an SME's OpenClaw → NewAPI hits a real upstream LLM rather than a 404 / "no channel found" error. Until now the chart's channels.yaml ConfigMap was a documentation surface only; the upstream NewAPI binary persists channel state to its Postgres `channels` table via its admin API at /api/channel/. This patch bridges that gap. Discovery: - Canonical BankDhofar relay reference exists in openova-private/clusters/contabo-mkt/apps/axon/helmrelease.yaml (axon.vllm.baseUrl=https://llm-api.omtd.bankdhofar.com, defaultModel=qwen3-coder, secret=axon-vllm-secret). - K8s secret confirmed live (axon/axon-vllm-secret, key AXON_VLLM_API_KEY). - Architecture: bp-newapi is per-Sovereign (one NewAPI per OTECH); SME tenants share it via OpenClaw's newapi.baseURL = https://newapi.<OTECHFQDN>. Channel seeding therefore happens at the Sovereign-level chart install, NOT per-tenant. Changes: 1. platform/newapi/chart/values.yaml - New `defaultChannels.qwenBankDhofar` block (enabled=false by default; per-Sovereign overlay flips it true with the canonical endpoint + commercial-contract attestation). - New `channelSeed` block configuring the post-install Helm hook Job (image, resources, backoff, deadline, hook delete policy). 2. platform/newapi/chart/templates/_helpers.tpl - effectiveChannels helper composes qwenBankDhofar BEFORE operator-supplied .Values.channels and BEFORE defaultChannels.vllm so it lands as channel #1 in NewAPI's row-insertion order (NewAPI's router resolves `model` lookups in row order). - New channelSeedJobName helper (shared by Job + RBAC + ConfigMap). 3. platform/newapi/chart/templates/channel-seed-job.yaml (NEW) - post-install/post-upgrade Helm hook Job that: * Mounts the operator-supplied master-key Secret (auth.adminUI.masterKeySecret) for one-time admin API auth. * Mounts the per-channel upstream API key Secret (defaultChannels.qwenBankDhofar.existingSecret). * Polls /api/status until 200 (handles NewAPI startup window). * For each default channel: GET /api/channel/?keyword=<name>; if a row whose `name` exactly matches exists, SKIP. Otherwise POST /api/channel/ with the channel definition. Idempotent — re-runs after upgrades are no-ops once channels exist. * Bounded RBAC (Role+RoleBinding only on the named Secrets). * Skip-render gates: channelSeed.enabled, defaultChannels.* enabled, masterKeySecret supplied. helm template with default values renders no Job (CI smoke clean). 4. clusters/_template/bootstrap-kit/80-newapi.yaml - Bumped chart version 1.2.0 → 1.3.0. - Added defaultChannels.qwenBankDhofar block to the per-Sovereign overlay shape (still enabled=false in the template — operator supplies endpoint + attestation + Secrets per Sovereign). 5. platform/newapi/chart/Chart.yaml - Bumped 1.2.0 → 1.3.0 with changelog comment. 6. products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go - bp-openclaw per-tenant overlay now emits `newapi.defaultModel: qwen3.6` so OpenClaw's UI surfaces the friendlier alias by default. (Both qwen3.6 and qwen3-coder route to the same channel via the chart's `models` list.) Verification: - helm lint . PASS (1 chart linted, 0 failed) - helm template (defaults) PASS (no Job rendered) - helm template (qwen enabled) PASS (Job + RBAC + ConfigMap + channels.yaml all render with channel #1 first) - helm template (endpoint empty) FAIL with helpful message (configurability gate) - go build ./... PASS - go test ./internal/handler/... PASS for SME tenant overlay tests (TestRenderSMETenantOverlay_*) - Pre-existing AuthHandover panic is unrelated to this change Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), every knob is configurable via the per-Sovereign bootstrap-kit overlay. The endpoint default is empty so a fresh `helm template` does not silently wire customers to a third-party host. Co-authored-by: alierenbaysal <alierenbaysal@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7de05bab9d
|
fix(bootstrap-kit,bp-newapi): bump slot pins (gitea 1.2.4, catalyst-platform 1.4.2) + gate Traefik Middleware on Cilium Sovereigns (bp-newapi 1.2.0) (#842)
Three issues blocking the otech103 verification proof on a freshly merged main, all uncovered while live-driving the Day-2 Independence cutover: 1. clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml pinned 1.4.0 — missed the bumps from PR #839 (1.4.1, RBAC dual-mode render) and PR #841 (1.4.2, POWERDNS env literal). Bumping the slot pin to 1.4.2 lands those fixes on every fresh provision. 2. clusters/_template/bootstrap-kit/10-gitea.yaml pinned 1.2.3 — missed the bump from PR #832 (1.2.4, gitea-admin-secret canonical Secret for cutover Step-1 to mount). Bumping to 1.2.4 unblocks bp-self-sovereign-cutover Step-1 (gitea-mirror Job). 3. platform/newapi/chart/templates/ingress.yaml hard-rendered a traefik.io/v1alpha1 Middleware resource. On a Cilium Gateway Sovereign that CRD does not exist; bp-newapi 1.1.0 install failed with 'no matches for kind Middleware'. Gating the Middleware behind .Values.ingress.middleware.enabled (default false) lets the chart install on Cilium Sovereigns; contabo / Traefik clusters can still flip it on per-overlay. Bumping to 1.2.0 (additive feature, default-off, no breaking change). Slot 80-newapi pin bumped lockstep. Verified live state on otech103.omani.works (deployment id 12dff5098e33053e): - bp-newapi 1.1.0 HR: Status=False 'Helm install failed: ... no matches for kind Middleware in version traefik.io/v1alpha1' - bp-catalyst-platform HR pinned at 1.4.0 (lacks RBAC for cutover-driver) - bp-gitea HR pinned at 1.2.3 (lacks gitea-admin-secret) After this PR merges + Flux reconciles otech103, all three HRs upgrade in place and the cutover proof can be driven to completion. Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io> |
||
|
|
9645a9044a
|
feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) (#818)
* feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) Per #795 [Q-mine-3] (NATS not RedPanda) + [Q-mine-4] (one ledger), add the SME-2 metering integration end-to-end. NewAPI is consumed as the upstream image `ghcr.io/openova-io/openova/newapi-mirror` (a pinned mirror, not a fork) — the metering envelope is produced by a Go sidecar that observes the OpenAI-style `usage.total_tokens` field on every 2xx /v1/* response. This avoids forking the upstream binary while still producing the canonical envelope shape on `catalyst.usage.recorded`. A) NewAPI metering sidecar — core/services/metering-sidecar/ - Transparent reverse proxy in front of NewAPI on its own port; the bp-newapi Service routes the cluster-fronting port to the sidecar, which forwards to NewAPI on the pod's loopback. - Observes successful /v1/* JSON responses, parses `usage.{prompt_tokens,completion_tokens,total_tokens}`, computes amount_micro_omr = -tokens * priceMicroOMRPerToken, and publishes one envelope on `catalyst.usage.recorded` per completed request. - Failed (non-2xx), non-JSON, and admin-path requests are NOT billed. - Customer-facing latency is NEVER blocked on metering: the response body is restored before publish; on NATS unreachable the envelope is persisted to disk and retried by a background drain loop. - 14 unit tests (proxy + publisher + safeFilename guards). B) sme-billing NATS subscriber — core/services/billing/handlers/ metering_consumer.go - JetStream durable consumer `sme-billing-metering` on stream `CATALYST_USAGE` (provisioned by sme-billing on startup). - Idempotent on metadata.request_id via a UNIQUE partial index on credit_ledger.external_ref; redelivery from the broker collapses to a single ledger row. - Customer auto-create on cold start (the rbac sme.user.created envelope may land AFTER the first metered request; we don't strand usage waiting for it). - 11 unit tests covering happy-path, idempotency, malformed-payload poison-pill, missing-request-id, non-negative amount guard, resolver error → Nak, derive-micro-OMR-from-OMR, DB-error → Nak. C) HTTP handler POST /billing/metering/record — handlers/metering.go - Synchronous validate → INSERT credit_ledger → return {ledger_entry_id, balance_after_omr, balance_after_micro_omr, duplicate}. Same payload + idempotency guard as the NATS path. - Auth: superadmin OR sovereign-admin (operator-admin model; end-user LLM traffic flows through the sidecar, never this URL). - 8 unit tests covering happy-path, idempotency, role gating, malformed-JSON, positive-amount rejection, customer-not-found. D) Schema — core/services/billing/store/store.go - ALTER TABLE credit_ledger ADD COLUMN amount_micro_omr BIGINT (1 OMR = 1,000,000 micro-OMR; -0.000234 OMR = -234 micro-OMR exact integer — preserves precision at metering rates). - ADD COLUMN external_ref TEXT + UNIQUE partial index for idempotency dedup. - ADD COLUMN metadata JSONB for the raw envelope. - GetCreditBalance projects both amount_omr (legacy) and amount_micro_omr (new) into the integer-OMR view. - GetCreditBalanceMicroOMR returns canonical precision. - RecordUsage method: ON CONFLICT DO UPDATE … RETURNING (xmax<>0) distinguishes fresh insert from duplicate without a follow-up SELECT. E) Wiring - core/services/shared/events/nats.go — minimal NATS JetStream publisher + subscriber surface; legacy RedPanda producer/consumer in events.go untouched per [Q-mine-3]. - core/services/billing/main.go — NATS_URL env; subscriber wired in parallel with the existing RedPanda tenant-events consumer. - middleware/jwt.go — exported test helper WithClaims so handler tests can construct an authenticated context without minting a real signed token. - .github/workflows/services-build.yaml — metering-sidecar added to the build matrix; deploy job skips it (image consumed by the bp-newapi chart, not products/catalyst sme-services). F) bp-newapi chart (1.0.0 → 1.1.0) - meteringSidecar block in values.yaml: image, port, NATS URL, priceMicroOMRPerToken (default 156 = 0.000156 OMR/token), spool dir, header names, resources, securityContext (read-only-rootfs). - deployment.yaml renders the sidecar container + emptyDir spool volume when meteringSidecar.enabled (default true). - service.yaml routes the cluster-fronting :3000 to the sidecar when enabled, exposes a separate :3001 → NewAPI direct port for bp-catalyst-platform admin-API traffic (ADR-0003 §3.2). - networkpolicy.yaml allows the sidecar's port + nats-system egress for JetStream publish. Tests: 33 new (14 sidecar + 11 subscriber + 8 HTTP handler), all green. Helm template renders cleanly with sidecar enabled and disabled. Closes #798 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(billing/store): cast SUM to BIGINT so lib/pq scans into int64 (#798) Postgres returns `SUM(int) + SUM(bigint)/integer` as `numeric`, which lib/pq presents as a `[]uint8` decimal string ("50.000000000000000000000000") that does NOT scan directly into Go int64 — the integration test TestVoucherLifecycle_IssueRedeemAndCreditApplied caught this in CI on the post-redeem balance read. Wrap the SUM expressions in CAST(... AS BIGINT) so the column type is unambiguously bigint and Scan target stays uniform across pre-#798 rows (amount_omr only) and post-#798 rows (amount_micro_omr present). Affects: - GetCreditBalance - GetCreditBalanceMicroOMR - RecordUsage's running-balance read Test mocks updated to match the new SQL prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
20b3c5258a
|
feat(bp-newapi): chart maturation + first-otech deploy + Qwen vLLM channel (#799) (#812)
* feat(bp-newapi): chart maturation — ExternalSecret + first-otech vLLM channel + skip-render gates (#799) Maturation work for the SME-3 turnkey-experience epic (#795). Aligns the bp-newapi scratch chart with ADR-0003 (RBAC ↔ NewAPI user-create hook contract) and gets it past the blueprint-release CI smoke render that has blocked publication since PR #396 (run 25213444992 failed at default-values render of v1.0.0). Changes ------- - templates/external-secret.yaml (NEW). Renders the `catalyst-newapi-admin-token` ExternalSecret consumed by unified-rbac (ADR-0003 §3.2 + §6) for issuing per-user keys against `http://newapi.newapi.svc/api/v1/admin/users`. Sourced from OpenBao via the `vault-region1` ClusterSecretStore (canonical default shipped by bp-external-secrets-stores). Capabilities-gated on `external-secrets.io/v1beta1` so cold installs without ESO don't fail-render. Operator supplies the per-Sovereign OpenBao path via `catalystIntegration.externalSecret.remoteRef.key`; canonical convention is `sovereign/<sovereign-fqdn>/newapi/admin-token` with property `ADMIN_API_TOKEN`. Per Inviolable Principle #4 every knob is operator-overridable in the cluster overlay. - values.yaml. Adds `catalystIntegration.externalSecret.{enabled, refreshInterval, secretStoreRef.{kind,name}, remoteRef.{key,property}}` block (default enabled=true, key="" so a misconfigured overlay fails loudly at render rather than silently skipping). Adds `defaultChannels.vllm` block — first-otech shorthand that composes a vLLM-typed channel into the rendered channels list when enabled. Default endpoint is empty per Inviolable Principle #4; the `clusters/<sovereign>/bootstrap-kit/80-newapi.yaml` overlay supplies the per-Sovereign URL (canonical first-otech reference = `https://llm-api.omtd.bankdhofar.com` model `qwen3-coder`, the same upstream Axon uses on the OpenOva marketing deployment). - templates/_helpers.tpl. New `bp-newapi.effectiveChannels` helper composes `.Values.channels` with `defaultChannels.vllm` (when enabled). The `assertChannelAttestation` helper now operates on the effective list so attestation gates apply to defaultChannels composition too. `defaultChannels.vllm.enabled=true` with empty endpoint fails-fast at render with a guided error message. - templates/configmap.yaml. Channels rendering switches to the effectiveChannels helper. OIDC block now skip-renders gracefully when `auth.adminUI.keycloak.issuer` is unset (smoke-render path) instead of `required`-failing; the per-Sovereign overlay sets the issuer. - templates/deployment.yaml. Skip-render gate on Deployment when `database.existingSecret`, `credentials.existingSecret`, or (when Keycloak mode is selected) the OIDC client secret is missing. Removes the four `required` calls that were failing CI smoke render. Service, ServiceAccount, ConfigMap, NetworkPolicy still render so the smoke test gets a non-empty output proving structural soundness; the actual Deployment defers until the per-Sovereign overlay wires the secrets. - templates/ingress.yaml. Same skip-render pattern: when either `ingress.host` or `ingress.adminHost` is empty, the entire ingress block is silently skipped. Matches the bp-keycloak / bp-openbao / bp-external-dns HTTPRoute templates. - Chart.yaml. version 1.0.0 → 1.1.0 (minor bump — additive features; no breaking changes to existing operator overrides). Verification ------------ `helm template` smoke render on default values now succeeds with 4 resources (NetworkPolicy / ServiceAccount / ConfigMap / Service); 168 lines, well above the CI 5-line minimum. With a full per-Sovereign overlay (hosts + secrets + Keycloak issuer + ESO Capabilities + Traefik Capabilities + defaultChannels.vllm.endpoint), 8 resources render including Deployment, both Ingresses, the Traefik allowlist Middleware, and the ExternalSecret. The composed qwen channel writes through to `channels.yaml` with the expected endpoint + models + attestation. Refs ---- ADR-0003 §3.2 + §6 — admin-token contract Issue #795 (epic) — locked decisions Issue #796 — hook contract spec (sequential blocker, merged) Inviolable Principles #1, #3, #4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(bootstrap-kit): slot 80 — bp-newapi default install (#799) Adds the canonical install slot for bp-newapi to every fresh Sovereign's bootstrap-kit. Sequenced after the W2.K1 dependency wave so NewAPI's ExternalSecret + Postgres DSN dependencies resolve on first reconcile. The HelmRelease declares `dependsOn: [bp-openbao, bp-keycloak, bp-cnpg]`: - bp-openbao(08): admin-token ExternalSecret backend - bp-keycloak(09): OIDC issuer for ops-staff admin UI at admin.<fqdn> - bp-cnpg(16): Postgres backing for users/credits/channels/audit Per-Sovereign overlays inherit the slot's defaults and override: - ingress.host api.${SOVEREIGN_FQDN} - ingress.adminHost admin.${SOVEREIGN_FQDN} - auth.adminUI.keycloak.issuer - database.existingSecret (Crossplane-claimed) - credentials.existingSecret - catalystIntegration.externalSecret.remoteRef.key sovereign/${FQDN}/newapi/admin-token - defaultChannels.vllm.enabled true (first-otech) - defaultChannels.vllm.endpoint (operator-supplied) The `_template/` slot keeps `defaultChannels.vllm.enabled: false` so a fresh Sovereign does not silently wire customers to a third-party endpoint; the canonical first-otech reference (Qwen3 Coder via `https://llm-api.omtd.bankdhofar.com`, same relay Axon uses on the OpenOva marketing deployment) is documented in-line for operators adopting the same upstream. Refs: #795 (epic), ADR-0003 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bootstrap-deps): register bp-newapi slot 80 in expected DAG (#799) Fixes the dependency-graph-audit drift detection caught at PR #812 CI: the audit script enumerates HelmReleases in clusters/_template/bootstrap-kit/ and compares to scripts/expected-bootstrap-deps.yaml; an HR present on disk but absent from the expected DAG is treated as drift. Adds the canonical entry for bp-newapi at slot 80 with the same depends_on set declared on the HelmRelease itself ([bp-openbao, bp-keycloak, bp-cnpg]). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bp-newapi): align blueprint.yaml spec.version with Chart.yaml (#799) The TestBootstrapKit_BlueprintCardsHaveRequiredFields static-validation gate asserts Chart.yaml version == blueprint.yaml spec.version. The chart was bumped to 1.1.0 in c63ecd8c; bumping the blueprint metadata to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
eb92e0496b
|
feat(platform): add bp-newapi — multi-tenant LLM marketplace gateway (#394) (#396)
Catalyst Blueprint wrapping the upstream NewAPI
(github.com/Calcium-Ion/new-api, MIT) for Sovereign operators whose
business model is reselling LLM access to their own customers.
Backend-only mode: the OpenAI-compatible API at api.<host>/v1/* is
customer-facing; the upstream's portal UI is disabled at ingress;
Catalyst replaces it as the customer surface; NewAPI's admin UI at
admin.<host> is exposed only to ops staff (IdP-gated).
Compliance posture enforced at the blueprint layer:
- Channel attestation gate (refuses to render if any enabled channel
lacks verifiable provenance — in-cluster, commercial-contract, or
byok)
- Geographic AUP enforcement (sanctioned-region block on commercial-
provider channels; US/EU export-control baseline)
- BYOK isolation (request-scoped, never aggregated)
- Reseller disclosure required
- Audit log on bp-cnpg (metadata-only by default)
ACME placeholder used throughout the README; replace with operator
identity in per-Sovereign overlays at clusters/<sovereign>/bootstrap-
kit/.
Files:
- platform/newapi/README.md (design doc + setup checklist)
- platform/newapi/blueprint.yaml (Catalyst Blueprint CR)
- platform/newapi/chart/{Chart.yaml,values.yaml}
- platform/newapi/chart/templates/{_helpers.tpl,deployment.yaml,
service.yaml,ingress.yaml,configmap.yaml,serviceaccount.yaml,
networkpolicy.yaml}
Closes design portion of #394.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|