Commit Graph

8 Commits

Author SHA1 Message Date
e3mrah
f6757c7c93
feat(docs): lean documentation strategy — consolidate 16 docs into 7 canonical + 3 subdirs (#2094)
* docs(arch): consolidate ARCHITECTURE + PLATFORM-TECH-STACK + NAMING + EPICS-1-6 + BOOTSTRAP-KIT-EXPANSION → docs/ARCHITECTURE.md (lean doc strategy)

Single canonical "how OpenOva works" doc per founder's lean-doc strategy.
2926 source lines → 1110 consolidated lines, no semantic loss.

Sections:
 §1  High-level model (Catalyst/Sovereign/Org/Env/Application/Blueprint)
 §2  Repo layout
 §3  Tech stack by layer (CNI/GitOps/IaC/event-spine/data/secrets/identity/...)
 §4  Naming conventions (dimensions, patterns, labels, DOMAINS-CANON)
 §5  Catalyst control plane (rules, CRDs, controllers, cutover, identity, surfaces)
 §6  Per-host-cluster infrastructure
 §7  Application Blueprints
 §8  Multi-region topology (1 cpx52/region, WireGuard-over-public-IPs, ClusterMesh)
 §9  Bootstrap-kit slot ordering (full 48-slot canonical list)
 §10 EPIC-level design overview (EPIC-0 through EPIC-6)
 §11 Per-chart DESIGN.md inventory
 §12 OAM influence
 §13 Read further

Stale literal fixes:
 - omantel.openova.io → omantel.biz / <sovereign>.<tld> / t38.omani.works (7 instances)
 - SPIRE marked DEFERRED / opt-in only (PR #665, TBD-V29 #2055)
 - failover-controller marked REPLACED by bp-continuum

New PR refs wired into §3:
 - PR #665   SPIRE deferral
 - PR #2071  bp-cnpg-pair synchronous remote_apply (zero-tx-loss multi-region)
 - PR #2087  bp-cnpg-pair pre-merge guard
 - PR #2093  bp-cnpg-pair pre-merge guard

New stack components added to §3:
 - bp-cnpg-pair  (synchronous remote_apply ReplicaCluster across ClusterMesh)
 - bp-continuum  (lease-based failover orchestrator)
 - bp-self-sovereign-cutover (8-tether pivot, ADR-0002, Principle #11)

Source docs (to be deleted by orchestrator in final PR):
 - docs/PLATFORM-TECH-STACK.md
 - docs/NAMING-CONVENTION.md
 - docs/EPICS-1-6-unified-design.md
 - docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md

* docs(principles): consolidate INVIOLABLE-PRINCIPLES + ANTI-PATTERN-CATALOG → docs/PRINCIPLES.md (lean doc strategy)

* docs(dod): consolidate 5-PILLAR-DOD + DOMAINS-CANON + SOVEREIGN-MULTI-REGION-DOD + PERSONAS-AND-JOURNEYS → docs/DOD.md (lean doc strategy)

* docs(runbooks+status+glossary): consolidate 5 runbooks → RUNBOOKS.md + refresh STATUS.md + fold banned-terms into GLOSSARY.md (lean doc strategy)

Part 1 — Runbook consolidation:
- NEW docs/RUNBOOKS.md with 7 numbered sections (provisioning, day-2 ops,
  Blueprint authoring, chart conventions, demo walk, failover, troubleshooting)
- Folds BLUEPRINT-AUTHORING / CHART-AUTHORING / DEMO-RUNBOOK /
  RUNBOOK-OPERATIONS / RUNBOOK-PROVISIONING into one canonical surface
- Documents dual-annotation requirement for charts with enabled.default: false
  (GUARD 1 #2087 no-upstream + GUARD 2 #2093 smoke-render) with bp-network-policies:1.0.1
  dead-reserve incident as the live evidence
- All admin.<fqdn> legacy URL refs → console.<fqdn>/bss (BSS lives in operator console)
- All openova.io / omantel.omani.works test commands → canonical t<NN>.omani.works
- Cites PRs #2076 (docs migration), #2082 (no-auto-close-keyword), #2087, #2093

Part 2 — STATUS.md refresh (renamed from IMPLEMENTATION-STATUS.md):
- Header dated 2026-05-20 (was 2026-04-29; 22 days stale per audit)
- Adds 🟦 CODE-COMPLETE state for "controllers + CRDs + tests landed,
  awaiting fresh-prov walk" (per 5-pillar DoD)
- Pillar 3 marked CODE-COMPLETE (PRs #2071/#2072/#2073/#2074/#2075/#2053)
- Adds 3 new CRDs verified in products/catalyst/chart/crds/:
  CNPGPair, PDM, Sandbox
- Sandbox controller chain CODE-COMPLETE
  (PRs #1615/#1618/#1621/#1622/#1626/#1631/#1632)
- SPIRE marked DEFERRED — opt-in only (PRs #665, #2056, #2061)
- New §6 CI / supply-chain guards table: hollow-chart (#2087),
  smoke-render (#2093), no-auto-close-keyword (#2082), observability-toggle,
  subchart 4-step, Flux version-pin replay
- New §9 Pillar-status table — Pillars 1/2/3/4 CODE-COMPLETE, Pillar 5 🚧
- Pillar 1 (PRs #2038 V18, #2043 V18-D), Pillar 2 (PR #2029 V20),
  Pillar 3 (per above), Pillar 4 (Sandbox chain)

Part 3 — GLOSSARY.md folded as single source of truth for banned terms:
- Header dated 2026-05-20, notes "single source of truth for banned terms"
  and "no separate BANNED-TERMS.md"
- Existing 11 banned-terms rows rewritten with italicized qualifiers
- NEW Forbidden test domains subsection:
  openova.io (mothership-only), omantel.openova.io (hallucinated),
  Nova Cloud (predecessor brand), eventforge.io (hallucinated),
  admin.<fqdn> (dead BSS URL)
- SPIFFE/SPIRE identity row + acronym row marked deferred per PR #665
  with TBD-V29 (#2055) re-introduction roadmap
- Cross-links updated: IMPLEMENTATION-STATUS → STATUS,
  SOVEREIGN-PROVISIONING + BLUEPRINT-AUTHORING → RUNBOOKS.md

CLAUDE.md NOT touched. Source files NOT deleted (orchestrator owns deletion).
No push, no PR. Manifest at /tmp/merge-D-runbooks-status-glossary-manifest.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: assemble lean doc strategy — delete legacy sources, move ledger/sessions/archive, ADR-0004, rewrite cross-refs

Per founder direction 2026-05-20 + user-global ~/.claude/CLAUDE.md §11.

This is the orchestrator commit on top of the four cherry-picked consolidation
commits (ARCHITECTURE, PRINCIPLES, DOD, RUNBOOKS+STATUS+GLOSSARY). It:

1. Deletes 15 legacy source docs (now folded into the 7 canonical):
   PLATFORM-TECH-STACK, NAMING-CONVENTION, EPICS-1-6-unified-design,
   BOOTSTRAP-KIT-EXPANSION-PLAN, INVIOLABLE-PRINCIPLES, ANTI-PATTERN-CATALOG,
   5-PILLAR-DOD, DOMAINS-CANON, SOVEREIGN-MULTI-REGION-DOD,
   PERSONAS-AND-JOURNEYS, BLUEPRINT-AUTHORING, CHART-AUTHORING,
   DEMO-RUNBOOK, RUNBOOK-OPERATIONS, RUNBOOK-PROVISIONING.

2. Moves transient + historical docs into proper subdirs:
   - docs/ledger/{TRUST,TRACKER}.md (cron-refreshed live state)
   - docs/sessions/{2026-05-17-convergence,2026-05-19-20-trust-recovery,
     2026-05-20-trust-audit,2026-05-20-walk-runbook}.md
   - docs/archive/{validation-log,orchestrator-state,omantel-handover-wbs}.md

3. Adds docs/adr/0004-cnpg-sync-replication.md (Pillar 3 zero-tx-loss decision)
   + docs/adr/README.md index.

4. Updates CLAUDE.md reading-order + repo-structure block to match the
   lean strategy and current core/ tree (controllers/, marketplace/, etc.).

5. Sweeps all .md files + .github/workflows + scripts to repoint old doc
   paths to the new canonical homes. ADR cross-references kept intact
   (ADRs are immutable historical artifacts).

Operator-side cron scripts that still write to the old paths
(/home/openova/bin/refresh-dod-dashboard.sh, refresh-wbs.sh and
openova-private/bin/trust-audit.sh) need a one-line path update —
flagged in the PR body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(bootstrap-kit): update repo-root sentinel to docs/PRINCIPLES.md

The bootstrap-kit Go test used `docs/INVIOLABLE-PRINCIPLES.md` as its
repo-root sentinel; the file no longer exists after the lean-doc
consolidation (it's now `docs/PRINCIPLES.md`). Update the walker to
match the new canonical filename.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 14:40:01 +04:00
e3mrah
7b31736482
fix(bp-cnpg-pair): switch to synchronous replication (remote_apply) for Pillar 3 zero-tx-loss (Refs #2064) (#2071)
* fix(bp-cnpg-pair): switch to synchronous replication (remote_apply) for Pillar 3 zero-tx-loss (Refs #2064)

The canonical Pillar 3 claim per CLAUDE.md §0 — "2 independent CNPG
clusters with ReplicaCluster sync over Cilium ClusterMesh on DMZ
WireGuard + region-kill failover with **zero transactions lost**" —
is UNACHIEVABLE with asynchronous-streaming replication.  Chart 0.1.1
ran async-streaming as the default (blueprint.yaml:161 verbatim:
"CNPG's replication model is asynchronous-streaming"); the audit at
/tmp/audit-pillar3-cnpg-2026-05-20.md flagged this as the headline
finding (verdict WIRED-INCORRECT for surface #9).

bp-cnpg-pair → chart 0.1.2 + bp-wordpress-tenant → 0.3.2:
  - Default `replication.mode: sync`. Primary CNPG Cluster CR now
    renders `synchronous_commit: "remote_apply"` +
    `synchronous_standby_names: "FIRST 1 (<replica-cluster-name>)"`
    into its postgresql.parameters block. COMMIT on the primary
    blocks until the replica has REPLAYED the WAL (strongest
    durability — replica-side SELECTs see the row before COMMIT
    returns).  This is the bar required for zero-tx-loss on
    region-kill failover.
  - `replication.mode: async` retained for forensic / lab use only;
    production deployments MUST stay on `sync` (documented in
    values.yaml + DESIGN.md §7).
  - configSchema knob `replication.{mode,sync.commit,sync.numSync}`
    surfaced in blueprint.yaml so the marketplace voucher → org
    wizard can present the trade-off; default = sync everywhere.

Trade-off (operator-facing, disclosed in values.yaml + DESIGN.md §7):
  - Every COMMIT pays one round-trip to the replica region. On
    Hetzner FSN <-> HEL the RTT is ~10 ms; on geographically
    distant pairs (e.g. EU <-> US ~100 ms) every tx sees that
    latency.
  - If the replica is unreachable, the primary BLOCKS new writes
    until recovery or an explicit `ALTER SYSTEM SET
    synchronous_standby_names = ''` break-glass.  This is by
    design — losing availability is the price of zero-tx-loss
    durability.

Why remote_apply (not remote_write or on):
  - remote_apply: replica has REPLAYED before COMMIT returns
    (strongest; chosen as canonical for Pillar 3).
  - remote_write: replica received but didn't fsync (allows
    replica-OS crash to lose tx).
  - on: local-fsync-only with no remote ordering guarantee.

Render-gate tests extended on BOTH charts:
  - cnpg-pair-render.sh Case 2 asserts synchronous_commit +
    synchronous_standby_names present by default; new Case 6
    asserts both ABSENT when mode=async.
  - active-hot-standby-render.sh (wp-tenant) extracts
    SYNC_COMMIT/SYNC_STANDBY from primary's postgresql.parameters
    and asserts the same; new Case 6 covers the async path.

Lockstep version bumps (Principle #14):
  - platform/cnpg-pair/chart/Chart.yaml 0.1.1 → 0.1.2
  - platform/wordpress-tenant/chart/Chart.yaml 0.3.1 → 0.3.2
  - products/catalyst/bootstrap/api/internal/catalog/blueprints.json
    bp-cnpg-pair 0.1.1 → 0.1.2
  - products/catalyst/bootstrap/ui/src/shared/constants/catalog.generated.ts
    bp-cnpg-pair 0.1.1 → 0.1.2
  No bootstrap-kit pin to bump (bp-cnpg-pair is not in
  expected-bootstrap-deps; bp-wordpress-tenant references
  `version: "*"` in sme_tenant_gitops.go).

Validation (Principle #15):
  - `helm template` renders both Cluster CRs with the sync block
    present on the primary (verified locally).
  - `kubectl apply --dry-run=client` succeeds on the rendered
    manifest (NOT server-side — server lies when CRD pre-installed,
    per PR #1933).
  - `helm lint` clean.
  - cnpg-pair render gate: 6/6 PASS (5 pre-existing + new Case 6).
  - wp-tenant active-hot-standby render gate: 6/6 PASS
    (5 pre-existing + new Case 6).

Coordination (NOT bundled in this PR):
  - bp-continuum controller is still not deployed (TBD-V14/#2065)
    so the failover orchestration isn't running yet.  This PR
    fixes the **data-loss CLAIM** (WAL durability bar); the
    failover-controller piece is separate per the audit's
    headline gaps #2/#3/#4.
  - D31 acceptance test (1M-row write → kill primary → count==1M
    on promoted replica) is also deferred (#2067).
  - DO NOT close #2064 on merge — operator walk on a fresh
    multi-region prov with counter-incrementing region-kill test
    is required first per CLAUDE.md §4 anti-theater rule.

Refs #2064
Refs #1831

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cnpg-pair, wordpress-tenant): bump blueprint.yaml spec.version lockstep with Chart.yaml (Refs #2064)

The manifest-validation CI test
TestBootstrapKit_BlueprintVersionLockstepSweep caught a real
drift on the previous commit: blueprint.yaml spec.version MUST
equal chart/Chart.yaml version per TBD-A20 / #1856.  Chart.yaml
was bumped 0.1.1 -> 0.1.2 (cnpg-pair) and 0.3.1 -> 0.3.2
(wordpress-tenant) but blueprint.yaml was left behind.

Refs #2064

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hati.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 10:10:49 +04:00
e3mrah
6685bd7441
feat(catalog-seed): add bp-cnpg-pair Blueprint + wordpress-tenant active-hot-standby mode (Refs TBD-E8b, TBD-B31) (#1717)
Wave 28-B discovery: the bp-cnpg-pair Catalyst-curated Blueprint chart
(platform/cnpg-pair/ @ 0.1.1) was missing from the catalog-seed
template added by PR #1697. The chart is published at
oci://ghcr.io/openova-io/bp-cnpg-pair, but operators had no way to see
it in /api/v1/catalog on a fresh Sovereign — only the 13 entries from
PR #1697 rendered.

This PR seeds bp-cnpg-pair alongside its bp-cnpg companion in
templates/catalog-seed/blueprints.yaml. Render goes from 13 -> 14
Blueprint CRs on a freshly-handed-over Sovereign.

Also wires the canonical `database.mode` enum knob on bp-wordpress-
tenant (singleton | active-hot-standby), aligning the operator-facing
interface with the new bp-cnpg-pair Blueprint:

  - chart/values.yaml: new `database.mode` (empty default for back-compat).
  - chart/templates/_helpers.tpl: new `bp-wordpress-tenant.dbMode` helper
    with resolution precedence (enum wins; legacy
    `pg.activeHotStandby.enabled` boolean folds as alias for chart
    0.3.x overlays).
  - chart/templates/cnpg-cluster.yaml: reads the resolved enum via the
    helper instead of the raw boolean. Output is bit-for-bit identical
    when overlays don't set the new knob (back-compat smoke verified:
    legacy boolean still renders 2 Cluster CRs).
  - blueprint.yaml: configSchema exposes `database.mode` so the
    marketplace voucher -> org wizard (D29) can present a
    "Postgres topology" picker instead of a boolean.
  - Chart.yaml: version bump 0.3.0 -> 0.3.1.

Status:
  - chart render: helm lint clean on both charts; 4 invariants pass
    (singleton/mode=ahs/legacy-bool/mode-overrides-bool).
  - runtime D31: chart-rendered as of PR #1562; full prov-time
    runtime verification remains deferred (gated on next Sovereign
    fresh-prov per docs/SESSION-2026-05-17-CONVERGENCE.md).

Refs TBD-E8b, TBD-B31.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-18 19:08:05 +04:00
e3mrah
c04b2ec76d
feat(wordpress-tenant): activeHotStandby option wires bp-cnpg-pair (D31) (#1562)
Sovereign DoD D31 — tenants subscribing to an HA-capable marketplace app
may opt into a cross-region active-hot-standby Postgres pair for their
WordPress instance instead of the default single CNPG Cluster.

Mirrors the canonical bp-cnpg-pair pattern (primary + replica Cluster
CRs with WAL streaming over Cilium ClusterMesh via a managed Service
annotated service.cilium.io/global=true). When the new
pg.activeHotStandby.enabled flag is false (default), templates render
the existing single Cluster bit-for-bit — no regression for non-HA
tenants.

Catalog seed flags WordPress with ha + cnpg-pair tags so the marketplace
HA filter can surface it.

Chart bumped 0.2.1 -> 0.3.0. New render-gate test asserts both default
single-cluster shape AND the enabled 2-Cluster shape with the right
nodeSelectors, replica.source, externalCluster.host, Cilium global
annotation, and bootstrap.pg_basebackup; all 5 cases pass.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:39:29 +04:00
e3mrah
74d23ab3dc
fix(charts): explicit harbor.openova.io/proxy-dockerhub prefix on all chart-hook images (#163) (#1367)
Per CLAUDE.md MIRROR-EVERYTHING inviolable rule: every chart-hook
image reference (pre/post-install Jobs, helper Pods) must use the
explicit Harbor proxy-cache form. Fix #158's bitnami → bitnamilegacy
swap was a band-aid; the architecturally correct fix is to defeat
upstream-deletion blast radius entirely by routing through Harbor.

The node-level containerd mirror in infra/hetzner/cloudinit-control-
plane.tftpl (line 706) already redirects docker.io/* →
harbor.openova.io/proxy-dockerhub/* implicitly, but implicit routing:
  - Hides the routing from SBOM scans
  - Bypasses the Kyverno harbor-proxy-pull ClusterPolicy
  - Means a chart audit (`grep docker.io`) misses a real dependency
  - Was the proximate cause of prov #27 wedging when Bitnami deleted
    docker.io/bitnami/kubectl:1.30.4 (Fix #158 had to chase the
    deletion mid-flight instead of being insulated by Harbor cache)

19 chart-hook image: refs + 5 chart values.yaml repository: defaults
now carry the explicit harbor.openova.io/proxy-dockerhub prefix.
Application/subchart images (keycloak, postgresql, mongodb in
keycloak+litmus subcharts) are intentionally out of scope for this
PR — those go through the node-level containerd mirror still.

Affected blueprints + chart version bumps:
  bp-cert-manager            1.2.1  -> 1.2.2
  bp-external-secrets-stores 1.0.4  -> 1.0.5
  bp-crossplane-claims       1.1.4  -> 1.1.5
  bp-flux                    1.2.1  -> 1.2.2
  bp-guacamole               0.1.16 -> 0.1.17
  bp-self-sovereign-cutover  0.1.28 -> 0.1.29
  bp-k8s-ws-proxy            0.1.9  -> 0.1.10
  bp-harbor                  1.2.15 -> 1.2.16
  bp-gitea                   1.2.5  -> 1.2.6
  bp-newapi                  1.4.5  -> 1.4.6
  bp-wordpress-tenant        0.2.0  -> 0.2.1
  catalyst-platform          1.4.138 -> 1.4.139

Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:32:21 +04:00
e3mrah
3fe27f625f
feat(bp-wordpress-tenant): wp-cli OIDC bootstrap + oidc.* canonical block (0.2.0, #915) (#927)
Umbrella issue #915 (D1 sub-task). Aligns the chart's post-install OIDC
config Job with the canonical wp-cli flow and the bp-keycloak tenant-
realm contract C1's PR #918 ships.

Chart 0.2.0
-----------
- templates/oidc-config-job.yaml rewritten to use the official
  wordpress:cli-2.12.0-php8.3 image (manifest-list digest pinned per
  Inviolable Principle #4). Replaces direct PHP/SQL UPSERTs against
  wp_options with:
    * wp core install (idempotent: wp core is-installed)
    * wp plugin install openid-connect-generic --activate (idempotent:
      wp plugin is-installed)
    * wp option update openid_connect_generic_settings <json>
    * wp option update default_role
    * wp theme install/activate
    * wp option update siteurl/home
  Going through wp-cli (i.e. WordPress core's own PHP API) is more
  resilient than schema-shape-dependent INSERT statements and survives
  WordPress minor upgrades.

- values.yaml: new canonical oidc.* block —
    oidc.{enabled, issuerURL, clientId, clientSecretName, defaultRole,
          identityKey, roleMapping, cliImage}.
  Default oidc.clientSecretName = "wordpress-oidc-client-secret"
  matches the K8s Secret bp-keycloak's PR #918 emits alongside the
  realm import ConfigMap (so the realm JSON's `secret` field and the
  Secret bytes never drift).

- Legacy keycloak.{realmURL, clientID, clientSecretName} kept as a
  back-compat alias. _helpers.tpl folds it into oidc.* when the
  modern keys are at their values.yaml defaults so chart 0.1.x
  clusters keep reconciling. Removed in chart 0.3.0.

- oidc.defaultRole=subscriber — newly auto-created SSO users land
  with subscriber capability (operator overrides via overlay).

- Redirect URIs: the openid-connect-generic plugin's default callback
  is /wp-admin/admin-ajax.php?action=openid-connect-authorize when
  alternate_redirect_uri=0 (we set 0). bp-keycloak (PR #918)
  registers the same URL plus /wp-login.php and a /* wildcard, so the
  client's allowed-redirect-URI list aligns with what the plugin
  actually issues.

Orchestrator emit
-----------------
- products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go
  smeTenantBPWordPress now emits the canonical oidc.* block AND the
  legacy keycloak.* alias (for chart 0.1.x clusters mid-upgrade).

Tests
-----
- chart/tests/oidc-config.sh — 7 helm-template assertions:
    1. Canonical oidc.* render produces a Job with the required
       wp-cli command flow + wordpress:cli-2.12.0-php8.3 image.
    2. Legacy keycloak.* fold path (chart 0.1.x compat).
    3. oidc.enabled=false short-circuits the Job.
    4. alternate_redirect_uri=0 (so plugin URL matches the realm-
       registered redirect URI from PR #918).
    5. defaultRole rendered + propagated.
    6. Render YAML is parseable and contains all required kinds.
    7. wp-content PVC mounted in the Job (so pg4wp's db.php drop-in
       loads — failure here would silently fall back to mysqli).

- internal/handler/sme_tenant_test.go:
    * TestRenderSMETenantOverlay_WordPressEmitsOIDC — pins the
      canonical oidc.* block + legacy keycloak.* alias the
      orchestrator emits for the alice@omantel test fixture.
    * TestRenderSMETenantOverlay_WordPressOIDC_BYOMode — BYO domain
      mode renders wordpress.<byo-domain> as the ingress host.

Verification
------------
- helm lint clean
- helm template smoke green for: oidc.* canonical, keycloak.* legacy
  fold, oidc.enabled=false short-circuit
- chart/tests/oidc-config.sh: 7/7 PASS
- chart/tests/observability-toggle.sh: 2/2 PASS (regression)
- go test ./internal/handler/ -run "SMETenant|TestRenderSME": all
  green (TestAuthHandover_HappyPath failure is pre-existing on main,
  unrelated to this change)

Closes (D1 sub-task) of #915.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:10:41 +04:00
e3mrah
3e7284de45
fix(bp-wordpress-tenant): default-values smoke render must succeed (#800) (#814)
The Blueprint Release workflow runs `helm template <chart>` with NO
overrides as a smoke gate before publishing the OCI artifact. After
#800's initial merge (c141fcd1), that smoke step failed because
`smeDomain`, `keycloak.realmURL`, and `keycloak.clientSecretName`
used `required` calls or empty strings that produced render-time
errors:

  Error: execution error at (oidc-config-job.yaml:82:33):
    .Values.smeDomain or .Values.ingress.host MUST be set
    (no sensible default per INVIOLABLE-PRINCIPLES #4).

Fix: replace empty defaults with placeholder values
(`sme.local`, `https://auth.sme.local/realms/sme`,
`wordpress-oidc`) and remove the `required` template fences. Per-
Sovereign overlays MUST override these placeholders at install time;
the runtime `oidc-config` Job will surface a clear failure if they
remain on the placeholder (Keycloak realm URL won't resolve). This
matches the trade-off INVIOLABLE-PRINCIPLES #4 calls out — operator-
configurable values, no production-safe defaults, but smoke-render
still passes.

Verified:
  - `helm template smoke .` (no overrides) → 812 lines, 11 K8s
    resources rendered cleanly.
  - `helm template smoke . --set smeDomain=... --api-versions
    postgresql.cnpg.io/v1 ...` → 12 resources including the CNPG
    Cluster, with all wordpress images SHA-pinned to
    sha256:054e611...196.
  - chart/tests/observability-toggle.sh both cases PASS.
  - `helm lint` only the cosmetic icon-recommended INFO note.

Refs: #800

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-04 22:19:40 +04:00
e3mrah
c141fcd1d3
feat(bp-wordpress-tenant): turnkey SSO-wired WordPress per SME (#800) (#811)
New scratch Blueprint chart `bp-wordpress-tenant` v0.1.0 that
provisions a turnkey, SSO-pre-wired WordPress instance per SME tenant
inside the SME's vcluster, satisfying ticket #800 (SME-5) of the #795
SME-tenant turnkey experience epic.

What it provisions:

  - Deployment of `wordpress:6-php8.3-apache` (manifest-list digest
    sha256:054e611...196), pulled through the Sovereign Harbor
    proxy-cache when `global.imageRegistry` is set (per
    INVIOLABLE-PRINCIPLES #4).
  - Two initContainers seed wp-content/ from the image onto the PVC
    and install the openid-connect-generic plugin + pg4wp Postgres
    drop-in from wordpress.org / GitHub. Idempotent, runs only once
    per PVC.
  - Postgres provisioned in-tenant via a `Cluster.postgresql.cnpg.io`
    (default `wordpress-db`, 1 instance, 10Gi, pg16). The CNPG-emitted
    `<cluster>-app` Secret is mirrored into `wordpress-database-secret`
    by Reflector + a post-install sync Job (otech30 race fix carried
    forward from bp-gitea).
  - PVC for `/var/www/html/wp-content/` (default 10Gi, RWO,
    helm.sh/resource-policy: keep so customer content survives
    `helm uninstall`).
  - Ingress at `wordpress.<smeDomain>` with cert-manager TLS via
    operator-supplied ClusterIssuer (default `letsencrypt-prod`).
  - NetworkPolicy restricting egress to bp-cnpg :5432, Keycloak
    :8443/:8080, kube-dns, and HTTPS to public IPs (for plugin/theme
    fetches).
  - Three post-install Jobs:
      hook weight 5  — db-secret-sync (PATCHes wordpress-database-
                       secret.password from CNPG <cluster>-app)
      hook weight 10 — oidc-config (UPSERTs openid_connect_generic_
                       settings, active_plugins, template/stylesheet,
                       siteurl/home rows in wp_options via PHP+PDO)
      hook weight 15 — admin-user (INSERT/UPDATE wp_users +
                       wp_usermeta for SME admin's email with
                       administrator role)

After all hooks complete, the SME admin's first browser hit lands on
/wp-admin authenticated via Keycloak SSO — no install wizard, no
manual config.

Hollow-chart guard (issue #181) satisfied via the `common` library
subchart from sigstore, matching bp-newapi's pattern for scratch
charts (no first-party WordPress Helm chart exists upstream).

Tests:
  - chart/tests/observability-toggle.sh verifies BLUEPRINT-AUTHORING
    §11.2 (default render produces no PodMonitor/ServiceMonitor).
  - `helm template` smoke render with required values produces 11 K8s
    resources cleanly; `helm lint` zero-failure.

Refs: #800, #795

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-04 22:13:32 +04:00