Commit Graph

16 Commits

Author SHA1 Message Date
github-actions[bot]
22851d980d deploy: bump bp-newapi upstream v0.13.2 chart 1.4.8 2026-05-18 07:03:09 +00:00
e3mrah
4abd156fee
feat(newapi): real /admin/tokens/sandbox mint impl (was stub from #1619) (#1638)
Replaces the Wave 1b stub that echoed the inbound PAT verbatim with a
real HS256 mint flow the sandbox-controller can call when it rolls out
a fresh Sandbox Pod.

Handler (platform/newapi/internal/handler/sandbox_token.go):
  - Caller auth: shared admin-secret bearer (env NEWAPI_ADMIN_SECRET),
    constant-time compared. 401 on mismatch / missing bearer.
  - Request body: {org_id, user_id, sandbox_id, allowed_channels[]}.
    De-duplicates + scrubs empty channel names so a controller bug
    sending [""] can't mint a token that NewAPI silently treats as
    "no restriction".
  - Mints HS256 JWT signed with NEWAPI_TOKEN_SIGNING_KEY. Claim shape:
    {sub: sandbox_id, org: org_id, user: user_id, channels: [...],
     iat, exp: iat+7d, typ: "sandbox"}.
  - Returns {token, expires_at}.
  - Refuses with 503 when SigningKey or AdminSecret is unset
    (visible chart-wiring gap, not a forgeable-token leak).
  - Removes the previous Claims/jwt.Parse PAT-validation path that
    came with the stub — caller is the controller, not an operator.
  - NewHandlerFromEnv() factory loads + validates env at process
    start so catalyst-api can fail loudly instead of shipping the
    endpoint silently.

Unit tests (sandbox_token_test.go) — 11 cases:
  - happy path (mint + claim shape + signature round-trip)
  - de-dup + empty-channel scrub
  - admin-secret mismatch / missing bearer → 401
  - missing org_id / user_id / sandbox_id / empty channels → 400
  - non-POST → 405
  - unset env → 503
  - mintSandboxToken empty-secret guard + round-trip
  - response does not echo admin secret or signing key

Chart wiring (platform/newapi/chart):
  - New Secret template sandbox-token-signing-key-secret.yaml
    auto-renders with Helm `lookup` + helm.sh/resource-policy: keep
    (same load-bearing pattern as credentials-secret.yaml #943 and
    gitea admin-secret.yaml #830 Bug 2). 64-char alphanumeric values
    for both SIGNING_KEY and ADMIN_SECRET; persistence across
    reconciles is required because a reconcile-time rotation would
    silently invalidate every per-Sandbox token across the Sovereign
    AND break the sandbox-controller's auth path until its Pod
    restarts.
  - values.yaml block sandboxTokenSigningKey.{existingSecret,
    autoProvision, autoSecretName} matching the `credentials`
    convention (operator override > auto-provision > skip-render).
  - No Chart.yaml bump — chart value addition only.

Verification:
  - go build ./platform/newapi/internal/handler/... — clean
  - go test ./platform/newapi/internal/handler/... — 11/11 PASS
  - helm template platform/newapi/chart — Secret renders

How sandbox-controller will use it:
  1. Read NEWAPI_ADMIN_SECRET from mounted Secret newapi-token-signing-key.
  2. POST /admin/tokens/sandbox with bearer + body
     {org_id: <Sandbox.spec.owner.orgRef.slug>,
      user_id: <Sandbox.spec.owner.email>,
      sandbox_id: <Sandbox.metadata.uid>,
      allowed_channels: ["qwen3.6-bankdhofar"]}.
  3. Write returned token into Secret/sandbox-<uid>-newapi-token.
  4. Mount that Secret into the Sandbox Pod as LLM_GATEWAY_TOKEN.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 11:02:40 +04:00
e3mrah
255eb3bf17
feat(sandbox+auth+newapi): Wave 1b — newapi proxy + BYOS + org-scoped JWT (#1619)
Three coordinated deliverables for Sandbox Wave 1b — scaffolding +
design + the ONE prerequisite (long-lived org-scoped JWT) the rest of
Sandbox depends on.

Deliverable 1 — newapi proxy contract:
  - products/sandbox/docs/newapi-proxy-contract.md: agent-pod env
    (LLM_GATEWAY_URL / OPENAI_BASE_URL alias), provider selection
    (?provider=qwen; default Qwen via omtd.bankdhofar.com), per-Sandbox
    token issuance via /admin/tokens/sandbox bridge, lifecycle +
    rotation, auth model.
  - platform/newapi/internal/handler/sandbox_token.go: bridge handler
    stub. Validates the inbound PAT (typ=pat + aud=newapi + org_id
    cross-check vs request body), then echoes a NewAPI-shaped response
    so the contract is testable without the upstream NewAPI admin
    API. Wave 4 wires the actual upstream calls.

Deliverable 2 — Claude Code BYOS OAuth:
  - products/sandbox/docs/claude-code-byos.md: UX (Connect Claude Max →
    OAuth → refresh token Secret/catalyst-system/sandbox-byos-claude-
    code-<user-uid>), Pod env injection (ANTHROPIC_API_KEY bypassing
    newapi), per-session toggle, revocation paths, chart wiring.
  - products/catalyst/bootstrap/api/internal/handler/byos_claude_code.go:
    POST /start, GET /callback, DELETE, GET /status — four endpoints
    behind RequireSession. Honest 503 + 501 surface so the popup
    flow exercises end-to-end against the placeholder client_id;
    Wave 4 flips it live.

Deliverable 3 — Long-lived org-scoped JWT (THE prerequisite):
  - platform/keycloak/chart/templates/configmap-sovereign-realm.yaml +
    configmap-tenant-realm.yaml: add `org` protocolMapper emitting
    user attribute `org` as claim `org_id`; add `org` to default
    client scopes for ALL clients.
  - core/services/auth/handlers/handlers.go: include typ=session in
    JWTs + document the cross-service claim contract.
  - core/services/auth/handlers/pat.go: NEW POST /auth/pat with
    admin-configurable TTL (default 7d, max 90d), audience claim,
    capabilities pass-through, typ=pat discriminator.
  - core/services/auth/handlers/routes.go + main.go: wire /auth/pat
    behind JWTAuth middleware.
  - core/services/shared/auth/claims.go: single Claims struct +
    HasCapability/HasGroup helpers + ContextKey for cross-service
    consumers (sandbox-controller, newapi bridge, MCP server).
  - products/catalyst/bootstrap/api/internal/auth/session.go: align
    Org JSON tag with new `org_id` claim; UnmarshalJSON accepts BOTH
    legacy `org` and new `org_id` so a rolling chart upgrade does
    not regress org-scoped queries.

Out of scope (Wave 4 wires):
  - Sandbox CRD + controller (writes Secret, mounts Pod env).
  - Actual outbound HTTP to Anthropic /oauth/token + KMS encrypt.
  - Actual outbound HTTP to NewAPI admin API.
  - Per-Sandbox capability projection from Keycloak groups.
  - PAT revocation lookup (jti store) + /auth/pats list.
  - Settings UI card + session-toolbar routing toggle.

Build verification (go vet + go build clean):
  - core/services/auth/...
  - core/services/shared/...
  - platform/newapi/internal/handler/...
  - products/catalyst/bootstrap/api/...

Founder TODO (single knob to flip BYOS live, Wave 4):
  Register an Anthropic OAuth client at
  https://console.anthropic.com/settings/oauth (public PKCE,
  redirect=https://console.<sov-fqdn>/api/v1/sandbox/byos/claude-code/callback)
  and paste the client_id into clusters/<sovereign>/bootstrap-kit/
  sandbox.yaml. Today every BYOS endpoint returns 503 with a clear
  message pointing at claude-code-byos.md §8.

Refs: products/sandbox/docs/architecture.md §6 (THE prerequisite).

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
2026-05-18 08:43:11 +04:00
github-actions[bot]
0520760543 deploy: bump bp-newapi upstream v0.13.2 chart 1.4.7 2026-05-11 07:32:46 +00:00
e3mrah
74d23ab3dc
fix(charts): explicit harbor.openova.io/proxy-dockerhub prefix on all chart-hook images (#163) (#1367)
Per CLAUDE.md MIRROR-EVERYTHING inviolable rule: every chart-hook
image reference (pre/post-install Jobs, helper Pods) must use the
explicit Harbor proxy-cache form. Fix #158's bitnami → bitnamilegacy
swap was a band-aid; the architecturally correct fix is to defeat
upstream-deletion blast radius entirely by routing through Harbor.

The node-level containerd mirror in infra/hetzner/cloudinit-control-
plane.tftpl (line 706) already redirects docker.io/* →
harbor.openova.io/proxy-dockerhub/* implicitly, but implicit routing:
  - Hides the routing from SBOM scans
  - Bypasses the Kyverno harbor-proxy-pull ClusterPolicy
  - Means a chart audit (`grep docker.io`) misses a real dependency
  - Was the proximate cause of prov #27 wedging when Bitnami deleted
    docker.io/bitnami/kubectl:1.30.4 (Fix #158 had to chase the
    deletion mid-flight instead of being insulated by Harbor cache)

19 chart-hook image: refs + 5 chart values.yaml repository: defaults
now carry the explicit harbor.openova.io/proxy-dockerhub prefix.
Application/subchart images (keycloak, postgresql, mongodb in
keycloak+litmus subcharts) are intentionally out of scope for this
PR — those go through the node-level containerd mirror still.

Affected blueprints + chart version bumps:
  bp-cert-manager            1.2.1  -> 1.2.2
  bp-external-secrets-stores 1.0.4  -> 1.0.5
  bp-crossplane-claims       1.1.4  -> 1.1.5
  bp-flux                    1.2.1  -> 1.2.2
  bp-guacamole               0.1.16 -> 0.1.17
  bp-self-sovereign-cutover  0.1.28 -> 0.1.29
  bp-k8s-ws-proxy            0.1.9  -> 0.1.10
  bp-harbor                  1.2.15 -> 1.2.16
  bp-gitea                   1.2.5  -> 1.2.6
  bp-newapi                  1.4.5  -> 1.4.6
  bp-wordpress-tenant        0.2.0  -> 0.2.1
  catalyst-platform          1.4.138 -> 1.4.139

Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:32:21 +04:00
github-actions[bot]
db26306f92 deploy: bump bp-newapi upstream v0.13.2 chart 1.4.5 2026-05-11 01:03:36 +00:00
e3mrah
54e65aa4b1
fix(bp-newapi): pre-install gate on external-secrets webhook readiness (Fix #138) (#1347)
prov #20: bp-newapi 1.4.2 HR FAILED with the chart's
templates/external-secret.yaml apply rejected by the apiserver:

  Internal error occurred: failed calling webhook
  "validate.externalsecret.external-secrets.io": ...
  no endpoints available for service "external-secrets-webhook"

bp-external-secrets reaches HR Ready=True the moment its Deployments
report Ready, but Pod Ready != webhook EndpointSlice reachable: the
apiserver-side EndpointSlice for the webhook Service has not been
observed by the validating admission controller's lookup yet. Flux
dependsOn satisfies the dependency graph but does NOT close this race.

Same root-cause class as Fix #137 (bp-external-secrets-stores) but a
DIFFERENT chart and DIFFERENT validation endpoint (ExternalSecret vs
ClusterSecretStore).

Canonical seam (Inviolable Principle #16): the chart that CONSUMES the
webhook owns the readiness gate. NOT the upstream external-secrets
chart (Fix #137 territory) and NOT a Flux HR-level dependsOn (which
checks the wrong layer).

Adds platform/newapi/chart/templates/000-external-secrets-webhook-
readiness-job.yaml — a pre-install/pre-upgrade Helm hook that polls
the webhook (default
external-secrets-webhook.external-secrets-system.svc:443/validate-
external-secrets-io-v1beta1-externalsecret) until it returns a
structured HTTP response (200/400/405/415/422). 60s wall budget, 2s
interval, no RBAC required (curl-only Pod, HTTPS to ClusterIP).

Templated end-to-end via .Values.externalSecretsWebhookGate.* per
Inviolable Principle #4 — operator may override service, namespace,
port, path, timeout, interval, or disable the gate entirely from a
per-Sovereign overlay.

Capability-gated on the external-secrets.io/v1beta1 CRD AND on the
existing catalystIntegration.externalSecret.enabled chain, so a
Sovereign that disables catalyst-integration pays no probe overhead.

Chart 1.4.2 -> 1.4.4 (1.4.3 was a deploy-only image-tag bump).
HR template clusters/_template/bootstrap-kit/80-newapi.yaml repinned.

## Claimed TCs
Infra-only fix; no UI behaviour change. Unblocks bp-newapi reaching
HR Ready=True on every fresh provision, which is a hard prerequisite
for:
- ADR-0003 §3.2 Catalyst signup hook (alice -> per-user NewAPI key)
- alice signup gate 5 (LLM) end-to-end
- Any TC that exercises /v1/* customer API or admin.<sovereign-fqdn>

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 05:03:08 +04:00
github-actions[bot]
a9861f9491 deploy: bump bp-newapi upstream v0.13.2 chart 1.4.3 2026-05-10 17:21:18 +00:00
e3mrah
f668d791ab
fix(bp-newapi): publish newapi-mirror image + repoint chart to existing tag (qa-loop bounded-cycle audit prov #7 Gap F) (#1315)
Root cause from live diagnosis (omantel.biz prov #7, kubectl --context=omantel):

The bp-newapi chart at platform/newapi/chart/values.yaml referenced
`ghcr.io/openova-io/openova/newapi-mirror:v0.4.5` since its first commit
(44d0200a, 2026-05-01). However:

1. NO CI workflow ever built that image. There is no
   `build-bp-newapi*.yaml` (or similar) under .github/workflows/. The
   GHCR package `ghcr.io/openova-io/openova/newapi-mirror` does not
   exist (404 from /orgs/openova-io/packages/container/...).

2. The tag `v0.4.5` is fictitious — neither upstream Calcium-Ion/new-api
   (`docker.io/calciumion/new-api`) nor the alternate ancestor
   (`justsong/one-api`) ever published a `v0.4.5`. The lowest stable
   Calcium-Ion tag is `v0.6.0.9`; the highest stable v0.x is `v0.13.2`
   (upstream publish 2026-04-27).

Result: every fresh Sovereign's NewAPI Pod ImagePullBackOff'd 403
Forbidden on the never-existed image, blocking alice signup gate 5
(LLM) and surfacing in the bounded-cycle audit as Gap F.

Fix (mirrors bp-guacamole CI pattern in
.github/workflows/build-bp-guacamole.yaml):

- NEW .github/workflows/build-bp-newapi.yaml — push to
  platform/newapi/chart/** triggers a Job that pulls
  `docker.io/calciumion/new-api:<UPSTREAM_VER>`, captures the upstream
  repo digest, re-tags as `ghcr.io/openova-io/openova/newapi-mirror:
  <UPSTREAM_VER>` + `:latest`, pushes both, then bumps values.yaml +
  Chart.yaml + dispatches blueprint-release.

- platform/newapi/chart/values.yaml — newapi.image.tag bumped from
  `v0.4.5` (fictitious) to `v0.13.2` (latest stable Calcium-Ion/new-api
  on Docker Hub). Comment block expanded with full rationale + link to
  the new build workflow + bump-in-lockstep instructions.

- platform/newapi/chart/Chart.yaml — version 1.4.1 → 1.4.2, appVersion
  `0.4.5` → `0.13.2` (Helm convention: appVersion = upstream version
  without the `v` prefix). Inline changelog records the audit-prov-7
  Gap F lineage.

- clusters/_template/bootstrap-kit/80-newapi.yaml — pinned chart
  version 1.4.1 → 1.4.2 with the same changelog inline.

Verified locally:
- `helm template smoke platform/newapi/chart --set
  database.existingSecret=fake --set credentials.existingSecret=fake
  --set auth.adminUI.mode=masterKey` renders
  `image: "ghcr.io/openova-io/openova/newapi-mirror:v0.13.2"` and
  `app.kubernetes.io/version: "0.13.2"`.

The v1.0.0-rc.x upstream line is gated on schema migration
stabilisation; the channel-seed Job uses the legacy admin-API request
shape, so do NOT auto-roll past v0.13.x without re-running the
channel-seed integration smoke against NewAPI's `/api/channel/`.

Pairs with the Gap C re-investigation memo (no chart fix needed; PR
#1309 only gated `defaultCompositionRef`, not the XRD itself; the
useraccesses.access.openova.io CRD is present on omantel prov #7).

DO NOT MERGE — this PR is for qa-loop bounded-cycle Wave 5 Fix #80
(Gap F) review.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:20:49 +04:00
e3mrah
2ff50f0591
fix(bp-newapi+services-build): imagePullSecrets on Pod, sed bumps values.yaml smeTag (#955)
Two SME-blocker bugs caught live on otech113 (alice signup gate 5 fails on
fresh Sovereign):

#952 — bp-newapi 1.4.0 Pod has no imagePullSecrets, so kubelet pulls
PRIVATE ghcr.io/openova-io/openova/{newapi-mirror,services-metering-sidecar}
anonymously and gets 403 Forbidden. Fix:

- Templatize spec.imagePullSecrets on Deployment + channel-seed Job.
- Default values.yaml `imagePullSecrets: [{name: ghcr-pull}]`.
- Add `newapi` to flux-system/ghcr-pull's reflector
  reflection-{allowed,auto}-namespaces in cloudinit-control-plane.tftpl
  so bp-reflector mirrors the source Secret into the namespace
  automatically on every fresh Sovereign.
- Bump bp-newapi 1.4.0 -> 1.4.1, update _template overlay.

#953 — services-build.yaml's image-rewrite loop only matched the
hardcoded `image: ghcr.io/.../services-<svc>:<sha>` form. 7 of 8
sme-services templates use `image: "{{ ... }}/services-<svc>:{{
.Values.images.smeTag }}"`. Each services-build run bumped only
auth.yaml while reporting "update sme service images to ${SHA}",
leaving the live Pod on stale bytes (PR #951's #941 fix never reached
services-catalog despite the merge + chart bump chain). Fix:

- After the hardcoded loop, also bump `images.smeTag` in
  products/catalyst/chart/values.yaml with a strict regex match
  (`^  smeTag: "<sha>"$`); refuse to auto-bump if the line shape
  changes (defends against silent drift if a contributor renames the
  field).
- Mirror the change into the retry-path `rewrite()` function so a
  reset-to-origin/main retry does not recreate the original bug.

Tests:

- platform/newapi/chart/tests/imagepullsecrets-render.sh — 4 cases
  asserting the Deployment and channel-seed Job carry the default
  ghcr-pull reference, that an empty override suppresses the block,
  and that custom secret names propagate (Inviolable Principle #4).
- tests/integration/services-build-rewrite.sh — 3 cases reproducing
  the workflow's rewrite logic on a sandboxed copy of the live
  chart, asserting both auth.yaml's hardcoded line AND values.yaml's
  smeTag get bumped, that helm-render of the catalyst chart with
  the bumped values produces all 8 SME-service Deployments at the
  new SHA, and that an idempotent re-bump to a second SHA also lands
  cleanly.

Refs: #952 #953 (umbrella #915 — alice signup gate 5).

Co-authored-by: hatiyildiz <143030955+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:47:37 +04:00
e3mrah
689276889c
fix(bp-catalyst-platform+bp-newapi): unblock alice signup gates 2-6 on Sovereigns (#915) (#951)
Six coupled chart + orchestrator fixes that unblock alice marketplace
signup → tenant ready → SaaS integrations → LLM → ledger on a freshly
franchised Sovereign. C5-final got Gate 1 GREEN on otech113 (2026-05-05)
but every downstream gate failed because the SME bundle hardcoded
contabo-only assumptions.

Bumps:
  - bp-catalyst-platform 1.4.21 → 1.4.22
  - bp-newapi             1.3.0 → 1.4.0
  - bootstrap-kit slot 13 + 80 pins updated in lockstep

Issues addressed (single consolidated PR — smaller PRs would race
against alice signup retries):

  - #934 (auth SMTP empty → "failed to send email"): sme-secrets.yaml
    now reads SMTP_* from `catalyst-system/sovereign-smtp-credentials`
    (the same A5-seeded source #883/#905 the chart 1.4.20 catalyst-
    openova-kc-credentials Secret already uses) with source-wins
    precedence. Both canonical (smtp-host/port/from/user/pass) AND
    legacy (host/port/from/user/password) source-Secret key shapes
    accepted. Empty source falls back to chart-level defaults so the
    contabo path stays clean.

  - #940 (provisioning service GITHUB_TOKEN placeholder + hardcoded
    upstream github.com): chart values
    .Values.smeServices.provisioning.{githubToken,git.{apiURL,owner,
    repo,branch}} make every GitHub-API coordinate operator-overridable
    with topology-aware defaults (Sovereign ⇒ in-cluster Gitea REST
    API + `openova` org; contabo ⇒ api.github.com + `openova-io` org).
    Provisioning binary's startup gate validates the GITHUB_TOKEN does
    NOT contain placeholder substrings (<placeholder>, PLACEHOLDER,
    REPLACE_ME, ...) and crashes the Pod into Pending if it does — the
    operator sees the misconfig immediately instead of after alice
    signups have failed silently in service logs. GitHub client now
    accepts a custom API URL via NewClientWithAPIURL so Gitea's GitHub-
    compatible /api/v1 surface drops in without re-implementing the
    client.

  - #941 (catalog "27 apps COMING SOON"): added `openclaw` and
    `stalwart-mail` to migrateAppDeployable's deployable map at
    core/services/catalog/handlers/seed.go. Both blueprints (bp-openclaw,
    bp-stalwart-{sovereign,tenant}) ship with visibility=listed in the
    embedded blueprints.json AND have working SME-tenant overlay
    templates in sme_tenant_gitops.go, but the catalog handler silently
    filtered them out because they were missing here. Map extracted to
    DeployableAppSlugs() exported function so unit tests can assert
    membership without invoking a Mongo store.

  - #942 (REDPANDA_BROKERS hardcoded to talentmesh): configmap.yaml
    selects broker default at render time based on global.sovereignFQDN
    — Sovereign ⇒ NATS JetStream Service per ADR-0001 (the only local
    bus on Sovereigns); contabo ⇒ legacy Redpanda Service in talentmesh.
    Operator MAY override either default via
    .Values.smeServices.eventBus.brokers without forking the chart.
    The ConfigMap key name stays REDPANDA_BROKERS for back-compat with
    existing SME service Go env wiring; new EVENT_BUS_PROTOCOL key
    surfaces the protocol hint for services that want to switch wire
    format independently.

  - #943 (bp-newapi silently skips Deployment): NEW
    templates/cnpg-cluster.yaml auto-provisions a CNPG-backed Postgres
    Cluster + Helm-`lookup`-persistent DSN Secret when
    .Values.cnpg.enabled (DEFAULT true). NEW templates/credentials-
    secret.yaml auto-generates SESSION_SECRET + CRYPTO_SECRET (each
    64-char randAlphaNum, persistent across reconciles via Helm
    `lookup`) when .Values.credentials.autoProvision (DEFAULT true).
    deployment.yaml gate now resolves Secret names from the chart-
    emitted defaults when the operator hasn't supplied an override.
    Capabilities-gated on postgresql.cnpg.io/v1 so a cold install
    before bp-cnpg is Ready surfaces as "no Cluster yet" rather than
    a hard install error.

  - #944 (CRITICAL — cross-cluster pollution): provisioning.yaml
    templates GIT_BASE_PATH from
    .Values.smeServices.provisioning.gitBasePath with a topology-aware
    default `clusters/<sovereignFQDN>/sme-tenants` on Sovereigns. NEW
    `core/services/provisioning/gitguard` package validates at startup
    AND on every commit code path that the path begins with
    `clusters/<self-FQDN>/` — refusing to commit to any other cluster's
    tree. Defence in depth so a runtime env mutation (kubectl exec,
    ConfigMap update without Pod restart, hostile sidecar) cannot
    bypass the check. Pre-#944 every alice tenant overlay landed in
    upstream openova/openova `clusters/contabo-mkt/tenants/<id>/`
    which contabo Flux would then install on the contabo cluster —
    C5-final caught + reverted the alice2 incident at commit 5715db04.

Tests:
  - core/services/provisioning/gitguard: 22 cases covering Sovereign
    + contabo + traversal + prefix-collision + placeholder token
  - core/services/catalog/handlers: openclaw/stalwart-mail in
    deployable map + stable-shape lock against accidental deletes
  - helm-template smoke pass: bp-newapi (default values renders
    Deployment + auto-provisioned Secrets); bp-catalyst-platform
    (Sovereign render shows GIT_BASE_PATH=clusters/otech113.../sme-
    tenants, REDPANDA_BROKERS=nats-jetstream..., GITHUB_OWNER=openova,
    GITHUB_API_URL=http://gitea-http...)

Closes #934 #940 #941 #942 #943 #944
Refs umbrella #915

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:27:23 +04:00
e3mrah
9447d88dfd
feat(bp-newapi): auto-seed channel #1 = Qwen3.6 @ BankDhofar (#915) (#919)
Per epic #915 (SME tenant integration DoD: alice → OpenClaw → NewAPI →
Qwen3.6@BankDhofar end-to-end), bp-newapi must come up with channel
#1 = Qwen3.6 hosted at BankDhofar
(https://llm-api.omtd.bankdhofar.com, model qwen3-coder / alias
qwen3.6) already wired to its admin API, so the FIRST customer
request from an SME's OpenClaw → NewAPI hits a real upstream LLM
rather than a 404 / "no channel found" error.

Until now the chart's channels.yaml ConfigMap was a documentation
surface only; the upstream NewAPI binary persists channel state to
its Postgres `channels` table via its admin API at /api/channel/.
This patch bridges that gap.

Discovery:
  - Canonical BankDhofar relay reference exists in
    openova-private/clusters/contabo-mkt/apps/axon/helmrelease.yaml
    (axon.vllm.baseUrl=https://llm-api.omtd.bankdhofar.com,
    defaultModel=qwen3-coder, secret=axon-vllm-secret).
  - K8s secret confirmed live (axon/axon-vllm-secret, key
    AXON_VLLM_API_KEY).
  - Architecture: bp-newapi is per-Sovereign (one NewAPI per OTECH);
    SME tenants share it via OpenClaw's newapi.baseURL =
    https://newapi.<OTECHFQDN>. Channel seeding therefore happens
    at the Sovereign-level chart install, NOT per-tenant.

Changes:
  1. platform/newapi/chart/values.yaml
     - New `defaultChannels.qwenBankDhofar` block (enabled=false by
       default; per-Sovereign overlay flips it true with the
       canonical endpoint + commercial-contract attestation).
     - New `channelSeed` block configuring the post-install Helm
       hook Job (image, resources, backoff, deadline, hook delete
       policy).

  2. platform/newapi/chart/templates/_helpers.tpl
     - effectiveChannels helper composes qwenBankDhofar BEFORE
       operator-supplied .Values.channels and BEFORE defaultChannels.vllm
       so it lands as channel #1 in NewAPI's row-insertion order
       (NewAPI's router resolves `model` lookups in row order).
     - New channelSeedJobName helper (shared by Job + RBAC + ConfigMap).

  3. platform/newapi/chart/templates/channel-seed-job.yaml (NEW)
     - post-install/post-upgrade Helm hook Job that:
       * Mounts the operator-supplied master-key Secret
         (auth.adminUI.masterKeySecret) for one-time admin API auth.
       * Mounts the per-channel upstream API key Secret
         (defaultChannels.qwenBankDhofar.existingSecret).
       * Polls /api/status until 200 (handles NewAPI startup window).
       * For each default channel: GET /api/channel/?keyword=<name>;
         if a row whose `name` exactly matches exists, SKIP. Otherwise
         POST /api/channel/ with the channel definition. Idempotent —
         re-runs after upgrades are no-ops once channels exist.
       * Bounded RBAC (Role+RoleBinding only on the named Secrets).
       * Skip-render gates: channelSeed.enabled, defaultChannels.*
         enabled, masterKeySecret supplied. helm template with default
         values renders no Job (CI smoke clean).

  4. clusters/_template/bootstrap-kit/80-newapi.yaml
     - Bumped chart version 1.2.0 → 1.3.0.
     - Added defaultChannels.qwenBankDhofar block to the per-Sovereign
       overlay shape (still enabled=false in the template — operator
       supplies endpoint + attestation + Secrets per Sovereign).

  5. platform/newapi/chart/Chart.yaml
     - Bumped 1.2.0 → 1.3.0 with changelog comment.

  6. products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go
     - bp-openclaw per-tenant overlay now emits `newapi.defaultModel:
       qwen3.6` so OpenClaw's UI surfaces the friendlier alias by
       default. (Both qwen3.6 and qwen3-coder route to the same
       channel via the chart's `models` list.)

Verification:
  - helm lint .                    PASS (1 chart linted, 0 failed)
  - helm template (defaults)       PASS (no Job rendered)
  - helm template (qwen enabled)   PASS (Job + RBAC + ConfigMap +
                                          channels.yaml all render
                                          with channel #1 first)
  - helm template (endpoint empty) FAIL with helpful message
                                   (configurability gate)
  - go build ./...                 PASS
  - go test ./internal/handler/... PASS for SME tenant overlay tests
                                   (TestRenderSMETenantOverlay_*)
  - Pre-existing AuthHandover panic is unrelated to this change

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), every knob is
configurable via the per-Sovereign bootstrap-kit overlay. The
endpoint default is empty so a fresh `helm template` does not
silently wire customers to a third-party host.

Co-authored-by: alierenbaysal <alierenbaysal@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:32:00 +04:00
e3mrah
7de05bab9d
fix(bootstrap-kit,bp-newapi): bump slot pins (gitea 1.2.4, catalyst-platform 1.4.2) + gate Traefik Middleware on Cilium Sovereigns (bp-newapi 1.2.0) (#842)
Three issues blocking the otech103 verification proof on a freshly merged main, all uncovered while live-driving the Day-2 Independence cutover:

1. clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml pinned 1.4.0 — missed the bumps from PR #839 (1.4.1, RBAC dual-mode render) and PR #841 (1.4.2, POWERDNS env literal). Bumping the slot pin to 1.4.2 lands those fixes on every fresh provision.

2. clusters/_template/bootstrap-kit/10-gitea.yaml pinned 1.2.3 — missed the bump from PR #832 (1.2.4, gitea-admin-secret canonical Secret for cutover Step-1 to mount). Bumping to 1.2.4 unblocks bp-self-sovereign-cutover Step-1 (gitea-mirror Job).

3. platform/newapi/chart/templates/ingress.yaml hard-rendered a traefik.io/v1alpha1 Middleware resource. On a Cilium Gateway Sovereign that CRD does not exist; bp-newapi 1.1.0 install failed with 'no matches for kind Middleware'. Gating the Middleware behind .Values.ingress.middleware.enabled (default false) lets the chart install on Cilium Sovereigns; contabo / Traefik clusters can still flip it on per-overlay. Bumping to 1.2.0 (additive feature, default-off, no breaking change). Slot 80-newapi pin bumped lockstep.

Verified live state on otech103.omani.works (deployment id 12dff5098e33053e):
- bp-newapi 1.1.0 HR: Status=False 'Helm install failed: ... no matches for kind Middleware in version traefik.io/v1alpha1'
- bp-catalyst-platform HR pinned at 1.4.0 (lacks RBAC for cutover-driver)
- bp-gitea HR pinned at 1.2.3 (lacks gitea-admin-secret)

After this PR merges + Flux reconciles otech103, all three HRs upgrade in place and the cutover proof can be driven to completion.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 00:22:55 +04:00
e3mrah
9645a9044a
feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) (#818)
* feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798)

Per #795 [Q-mine-3] (NATS not RedPanda) + [Q-mine-4] (one ledger), add
the SME-2 metering integration end-to-end. NewAPI is consumed as the
upstream image `ghcr.io/openova-io/openova/newapi-mirror` (a pinned
mirror, not a fork) — the metering envelope is produced by a Go sidecar
that observes the OpenAI-style `usage.total_tokens` field on every
2xx /v1/* response. This avoids forking the upstream binary while still
producing the canonical envelope shape on `catalyst.usage.recorded`.

A) NewAPI metering sidecar — core/services/metering-sidecar/
   - Transparent reverse proxy in front of NewAPI on its own port; the
     bp-newapi Service routes the cluster-fronting port to the sidecar,
     which forwards to NewAPI on the pod's loopback.
   - Observes successful /v1/* JSON responses, parses
     `usage.{prompt_tokens,completion_tokens,total_tokens}`, computes
     amount_micro_omr = -tokens * priceMicroOMRPerToken, and publishes
     one envelope on `catalyst.usage.recorded` per completed request.
   - Failed (non-2xx), non-JSON, and admin-path requests are NOT billed.
   - Customer-facing latency is NEVER blocked on metering: the response
     body is restored before publish; on NATS unreachable the envelope
     is persisted to disk and retried by a background drain loop.
   - 14 unit tests (proxy + publisher + safeFilename guards).

B) sme-billing NATS subscriber — core/services/billing/handlers/
   metering_consumer.go
   - JetStream durable consumer `sme-billing-metering` on stream
     `CATALYST_USAGE` (provisioned by sme-billing on startup).
   - Idempotent on metadata.request_id via a UNIQUE partial index on
     credit_ledger.external_ref; redelivery from the broker collapses
     to a single ledger row.
   - Customer auto-create on cold start (the rbac sme.user.created
     envelope may land AFTER the first metered request; we don't strand
     usage waiting for it).
   - 11 unit tests covering happy-path, idempotency, malformed-payload
     poison-pill, missing-request-id, non-negative amount guard,
     resolver error → Nak, derive-micro-OMR-from-OMR, DB-error → Nak.

C) HTTP handler POST /billing/metering/record — handlers/metering.go
   - Synchronous validate → INSERT credit_ledger → return
     {ledger_entry_id, balance_after_omr, balance_after_micro_omr,
     duplicate}. Same payload + idempotency guard as the NATS path.
   - Auth: superadmin OR sovereign-admin (operator-admin model;
     end-user LLM traffic flows through the sidecar, never this URL).
   - 8 unit tests covering happy-path, idempotency, role gating,
     malformed-JSON, positive-amount rejection, customer-not-found.

D) Schema — core/services/billing/store/store.go
   - ALTER TABLE credit_ledger ADD COLUMN amount_micro_omr BIGINT
     (1 OMR = 1,000,000 micro-OMR; -0.000234 OMR = -234 micro-OMR
     exact integer — preserves precision at metering rates).
   - ADD COLUMN external_ref TEXT + UNIQUE partial index for
     idempotency dedup.
   - ADD COLUMN metadata JSONB for the raw envelope.
   - GetCreditBalance projects both amount_omr (legacy) and
     amount_micro_omr (new) into the integer-OMR view.
   - GetCreditBalanceMicroOMR returns canonical precision.
   - RecordUsage method: ON CONFLICT DO UPDATE … RETURNING (xmax<>0)
     distinguishes fresh insert from duplicate without a follow-up
     SELECT.

E) Wiring
   - core/services/shared/events/nats.go — minimal NATS JetStream
     publisher + subscriber surface; legacy RedPanda producer/consumer
     in events.go untouched per [Q-mine-3].
   - core/services/billing/main.go — NATS_URL env; subscriber wired
     in parallel with the existing RedPanda tenant-events consumer.
   - middleware/jwt.go — exported test helper WithClaims so handler
     tests can construct an authenticated context without minting a
     real signed token.
   - .github/workflows/services-build.yaml — metering-sidecar added
     to the build matrix; deploy job skips it (image consumed by the
     bp-newapi chart, not products/catalyst sme-services).

F) bp-newapi chart (1.0.0 → 1.1.0)
   - meteringSidecar block in values.yaml: image, port, NATS URL,
     priceMicroOMRPerToken (default 156 = 0.000156 OMR/token), spool
     dir, header names, resources, securityContext (read-only-rootfs).
   - deployment.yaml renders the sidecar container + emptyDir spool
     volume when meteringSidecar.enabled (default true).
   - service.yaml routes the cluster-fronting :3000 to the sidecar
     when enabled, exposes a separate :3001 → NewAPI direct port for
     bp-catalyst-platform admin-API traffic (ADR-0003 §3.2).
   - networkpolicy.yaml allows the sidecar's port + nats-system
     egress for JetStream publish.

Tests: 33 new (14 sidecar + 11 subscriber + 8 HTTP handler), all green.
Helm template renders cleanly with sidecar enabled and disabled.

Closes #798

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(billing/store): cast SUM to BIGINT so lib/pq scans into int64 (#798)

Postgres returns `SUM(int) + SUM(bigint)/integer` as `numeric`, which
lib/pq presents as a `[]uint8` decimal string ("50.000000000000000000000000")
that does NOT scan directly into Go int64 — the integration test
TestVoucherLifecycle_IssueRedeemAndCreditApplied caught this in CI on
the post-redeem balance read.

Wrap the SUM expressions in CAST(... AS BIGINT) so the column type is
unambiguously bigint and Scan target stays uniform across pre-#798 rows
(amount_omr only) and post-#798 rows (amount_micro_omr present).

Affects:
  - GetCreditBalance
  - GetCreditBalanceMicroOMR
  - RecordUsage's running-balance read

Test mocks updated to match the new SQL prefix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:32:42 +04:00
e3mrah
20b3c5258a
feat(bp-newapi): chart maturation + first-otech deploy + Qwen vLLM channel (#799) (#812)
* feat(bp-newapi): chart maturation — ExternalSecret + first-otech vLLM channel + skip-render gates (#799)

Maturation work for the SME-3 turnkey-experience epic (#795). Aligns
the bp-newapi scratch chart with ADR-0003 (RBAC ↔ NewAPI user-create
hook contract) and gets it past the blueprint-release CI smoke render
that has blocked publication since PR #396 (run 25213444992 failed at
default-values render of v1.0.0).

Changes
-------
- templates/external-secret.yaml (NEW). Renders the
  `catalyst-newapi-admin-token` ExternalSecret consumed by unified-rbac
  (ADR-0003 §3.2 + §6) for issuing per-user keys against
  `http://newapi.newapi.svc/api/v1/admin/users`. Sourced from OpenBao
  via the `vault-region1` ClusterSecretStore (canonical default shipped
  by bp-external-secrets-stores). Capabilities-gated on
  `external-secrets.io/v1beta1` so cold installs without ESO don't
  fail-render. Operator supplies the per-Sovereign OpenBao path via
  `catalystIntegration.externalSecret.remoteRef.key`; canonical
  convention is `sovereign/<sovereign-fqdn>/newapi/admin-token` with
  property `ADMIN_API_TOKEN`. Per Inviolable Principle #4 every knob
  is operator-overridable in the cluster overlay.

- values.yaml. Adds `catalystIntegration.externalSecret.{enabled,
  refreshInterval, secretStoreRef.{kind,name}, remoteRef.{key,property}}`
  block (default enabled=true, key="" so a misconfigured overlay fails
  loudly at render rather than silently skipping). Adds
  `defaultChannels.vllm` block — first-otech shorthand that composes a
  vLLM-typed channel into the rendered channels list when enabled.
  Default endpoint is empty per Inviolable Principle #4; the
  `clusters/<sovereign>/bootstrap-kit/80-newapi.yaml` overlay supplies
  the per-Sovereign URL (canonical first-otech reference =
  `https://llm-api.omtd.bankdhofar.com` model `qwen3-coder`, the same
  upstream Axon uses on the OpenOva marketing deployment).

- templates/_helpers.tpl. New `bp-newapi.effectiveChannels` helper
  composes `.Values.channels` with `defaultChannels.vllm` (when
  enabled). The `assertChannelAttestation` helper now operates on the
  effective list so attestation gates apply to defaultChannels
  composition too. `defaultChannels.vllm.enabled=true` with empty
  endpoint fails-fast at render with a guided error message.

- templates/configmap.yaml. Channels rendering switches to the
  effectiveChannels helper. OIDC block now skip-renders gracefully when
  `auth.adminUI.keycloak.issuer` is unset (smoke-render path) instead
  of `required`-failing; the per-Sovereign overlay sets the issuer.

- templates/deployment.yaml. Skip-render gate on Deployment when
  `database.existingSecret`, `credentials.existingSecret`, or (when
  Keycloak mode is selected) the OIDC client secret is missing. Removes
  the four `required` calls that were failing CI smoke render. Service,
  ServiceAccount, ConfigMap, NetworkPolicy still render so the smoke
  test gets a non-empty output proving structural soundness; the actual
  Deployment defers until the per-Sovereign overlay wires the secrets.

- templates/ingress.yaml. Same skip-render pattern: when either
  `ingress.host` or `ingress.adminHost` is empty, the entire ingress
  block is silently skipped. Matches the bp-keycloak / bp-openbao /
  bp-external-dns HTTPRoute templates.

- Chart.yaml. version 1.0.0 → 1.1.0 (minor bump — additive features;
  no breaking changes to existing operator overrides).

Verification
------------
`helm template` smoke render on default values now succeeds with 4
resources (NetworkPolicy / ServiceAccount / ConfigMap / Service); 168
lines, well above the CI 5-line minimum. With a full per-Sovereign
overlay (hosts + secrets + Keycloak issuer + ESO Capabilities + Traefik
Capabilities + defaultChannels.vllm.endpoint), 8 resources render
including Deployment, both Ingresses, the Traefik allowlist Middleware,
and the ExternalSecret. The composed qwen channel writes through to
`channels.yaml` with the expected endpoint + models + attestation.

Refs
----
ADR-0003 §3.2 + §6 — admin-token contract
Issue #795 (epic) — locked decisions
Issue #796 — hook contract spec (sequential blocker, merged)
Inviolable Principles #1, #3, #4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(bootstrap-kit): slot 80 — bp-newapi default install (#799)

Adds the canonical install slot for bp-newapi to every fresh Sovereign's
bootstrap-kit. Sequenced after the W2.K1 dependency wave so NewAPI's
ExternalSecret + Postgres DSN dependencies resolve on first reconcile.

The HelmRelease declares `dependsOn: [bp-openbao, bp-keycloak, bp-cnpg]`:
- bp-openbao(08): admin-token ExternalSecret backend
- bp-keycloak(09): OIDC issuer for ops-staff admin UI at admin.<fqdn>
- bp-cnpg(16): Postgres backing for users/credits/channels/audit

Per-Sovereign overlays inherit the slot's defaults and override:
- ingress.host                                        api.${SOVEREIGN_FQDN}
- ingress.adminHost                                   admin.${SOVEREIGN_FQDN}
- auth.adminUI.keycloak.issuer
- database.existingSecret                             (Crossplane-claimed)
- credentials.existingSecret
- catalystIntegration.externalSecret.remoteRef.key    sovereign/${FQDN}/newapi/admin-token
- defaultChannels.vllm.enabled                        true (first-otech)
- defaultChannels.vllm.endpoint                       (operator-supplied)

The `_template/` slot keeps `defaultChannels.vllm.enabled: false` so a
fresh Sovereign does not silently wire customers to a third-party
endpoint; the canonical first-otech reference (Qwen3 Coder via
`https://llm-api.omtd.bankdhofar.com`, same relay Axon uses on the
OpenOva marketing deployment) is documented in-line for operators
adopting the same upstream.

Refs: #795 (epic), ADR-0003

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bootstrap-deps): register bp-newapi slot 80 in expected DAG (#799)

Fixes the dependency-graph-audit drift detection caught at PR #812 CI:
the audit script enumerates HelmReleases in clusters/_template/bootstrap-kit/
and compares to scripts/expected-bootstrap-deps.yaml; an HR present on
disk but absent from the expected DAG is treated as drift.

Adds the canonical entry for bp-newapi at slot 80 with the same
depends_on set declared on the HelmRelease itself
([bp-openbao, bp-keycloak, bp-cnpg]).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bp-newapi): align blueprint.yaml spec.version with Chart.yaml (#799)

The TestBootstrapKit_BlueprintCardsHaveRequiredFields static-validation
gate asserts Chart.yaml version == blueprint.yaml spec.version. The
chart was bumped to 1.1.0 in c63ecd8c; bumping the blueprint metadata
to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:17:25 +04:00
e3mrah
eb92e0496b
feat(platform): add bp-newapi — multi-tenant LLM marketplace gateway (#394) (#396)
Catalyst Blueprint wrapping the upstream NewAPI
(github.com/Calcium-Ion/new-api, MIT) for Sovereign operators whose
business model is reselling LLM access to their own customers.

Backend-only mode: the OpenAI-compatible API at api.<host>/v1/* is
customer-facing; the upstream's portal UI is disabled at ingress;
Catalyst replaces it as the customer surface; NewAPI's admin UI at
admin.<host> is exposed only to ops staff (IdP-gated).

Compliance posture enforced at the blueprint layer:
- Channel attestation gate (refuses to render if any enabled channel
  lacks verifiable provenance — in-cluster, commercial-contract, or
  byok)
- Geographic AUP enforcement (sanctioned-region block on commercial-
  provider channels; US/EU export-control baseline)
- BYOK isolation (request-scoped, never aggregated)
- Reseller disclosure required
- Audit log on bp-cnpg (metadata-only by default)

ACME placeholder used throughout the README; replace with operator
identity in per-Sovereign overlays at clusters/<sovereign>/bootstrap-
kit/.

Files:
- platform/newapi/README.md (design doc + setup checklist)
- platform/newapi/blueprint.yaml (Catalyst Blueprint CR)
- platform/newapi/chart/{Chart.yaml,values.yaml}
- platform/newapi/chart/templates/{_helpers.tpl,deployment.yaml,
  service.yaml,ingress.yaml,configmap.yaml,serviceaccount.yaml,
  networkpolicy.yaml}

Closes design portion of #394.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:57:06 +04:00