openova/CLAUDE.md
e3mrah f6757c7c93
feat(docs): lean documentation strategy — consolidate 16 docs into 7 canonical + 3 subdirs (#2094)
* docs(arch): consolidate ARCHITECTURE + PLATFORM-TECH-STACK + NAMING + EPICS-1-6 + BOOTSTRAP-KIT-EXPANSION → docs/ARCHITECTURE.md (lean doc strategy)

Single canonical "how OpenOva works" doc per founder's lean-doc strategy.
2926 source lines → 1110 consolidated lines, no semantic loss.

Sections:
 §1  High-level model (Catalyst/Sovereign/Org/Env/Application/Blueprint)
 §2  Repo layout
 §3  Tech stack by layer (CNI/GitOps/IaC/event-spine/data/secrets/identity/...)
 §4  Naming conventions (dimensions, patterns, labels, DOMAINS-CANON)
 §5  Catalyst control plane (rules, CRDs, controllers, cutover, identity, surfaces)
 §6  Per-host-cluster infrastructure
 §7  Application Blueprints
 §8  Multi-region topology (1 cpx52/region, WireGuard-over-public-IPs, ClusterMesh)
 §9  Bootstrap-kit slot ordering (full 48-slot canonical list)
 §10 EPIC-level design overview (EPIC-0 through EPIC-6)
 §11 Per-chart DESIGN.md inventory
 §12 OAM influence
 §13 Read further

Stale literal fixes:
 - omantel.openova.io → omantel.biz / <sovereign>.<tld> / t38.omani.works (7 instances)
 - SPIRE marked DEFERRED / opt-in only (PR #665, TBD-V29 #2055)
 - failover-controller marked REPLACED by bp-continuum

New PR refs wired into §3:
 - PR #665   SPIRE deferral
 - PR #2071  bp-cnpg-pair synchronous remote_apply (zero-tx-loss multi-region)
 - PR #2087  bp-cnpg-pair pre-merge guard
 - PR #2093  bp-cnpg-pair pre-merge guard

New stack components added to §3:
 - bp-cnpg-pair  (synchronous remote_apply ReplicaCluster across ClusterMesh)
 - bp-continuum  (lease-based failover orchestrator)
 - bp-self-sovereign-cutover (8-tether pivot, ADR-0002, Principle #11)

Source docs (to be deleted by orchestrator in final PR):
 - docs/PLATFORM-TECH-STACK.md
 - docs/NAMING-CONVENTION.md
 - docs/EPICS-1-6-unified-design.md
 - docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md

* docs(principles): consolidate INVIOLABLE-PRINCIPLES + ANTI-PATTERN-CATALOG → docs/PRINCIPLES.md (lean doc strategy)

* docs(dod): consolidate 5-PILLAR-DOD + DOMAINS-CANON + SOVEREIGN-MULTI-REGION-DOD + PERSONAS-AND-JOURNEYS → docs/DOD.md (lean doc strategy)

* docs(runbooks+status+glossary): consolidate 5 runbooks → RUNBOOKS.md + refresh STATUS.md + fold banned-terms into GLOSSARY.md (lean doc strategy)

Part 1 — Runbook consolidation:
- NEW docs/RUNBOOKS.md with 7 numbered sections (provisioning, day-2 ops,
  Blueprint authoring, chart conventions, demo walk, failover, troubleshooting)
- Folds BLUEPRINT-AUTHORING / CHART-AUTHORING / DEMO-RUNBOOK /
  RUNBOOK-OPERATIONS / RUNBOOK-PROVISIONING into one canonical surface
- Documents dual-annotation requirement for charts with enabled.default: false
  (GUARD 1 #2087 no-upstream + GUARD 2 #2093 smoke-render) with bp-network-policies:1.0.1
  dead-reserve incident as the live evidence
- All admin.<fqdn> legacy URL refs → console.<fqdn>/bss (BSS lives in operator console)
- All openova.io / omantel.omani.works test commands → canonical t<NN>.omani.works
- Cites PRs #2076 (docs migration), #2082 (no-auto-close-keyword), #2087, #2093

Part 2 — STATUS.md refresh (renamed from IMPLEMENTATION-STATUS.md):
- Header dated 2026-05-20 (was 2026-04-29; 22 days stale per audit)
- Adds 🟦 CODE-COMPLETE state for "controllers + CRDs + tests landed,
  awaiting fresh-prov walk" (per 5-pillar DoD)
- Pillar 3 marked CODE-COMPLETE (PRs #2071/#2072/#2073/#2074/#2075/#2053)
- Adds 3 new CRDs verified in products/catalyst/chart/crds/:
  CNPGPair, PDM, Sandbox
- Sandbox controller chain CODE-COMPLETE
  (PRs #1615/#1618/#1621/#1622/#1626/#1631/#1632)
- SPIRE marked DEFERRED — opt-in only (PRs #665, #2056, #2061)
- New §6 CI / supply-chain guards table: hollow-chart (#2087),
  smoke-render (#2093), no-auto-close-keyword (#2082), observability-toggle,
  subchart 4-step, Flux version-pin replay
- New §9 Pillar-status table — Pillars 1/2/3/4 CODE-COMPLETE, Pillar 5 🚧
- Pillar 1 (PRs #2038 V18, #2043 V18-D), Pillar 2 (PR #2029 V20),
  Pillar 3 (per above), Pillar 4 (Sandbox chain)

Part 3 — GLOSSARY.md folded as single source of truth for banned terms:
- Header dated 2026-05-20, notes "single source of truth for banned terms"
  and "no separate BANNED-TERMS.md"
- Existing 11 banned-terms rows rewritten with italicized qualifiers
- NEW Forbidden test domains subsection:
  openova.io (mothership-only), omantel.openova.io (hallucinated),
  Nova Cloud (predecessor brand), eventforge.io (hallucinated),
  admin.<fqdn> (dead BSS URL)
- SPIFFE/SPIRE identity row + acronym row marked deferred per PR #665
  with TBD-V29 (#2055) re-introduction roadmap
- Cross-links updated: IMPLEMENTATION-STATUS → STATUS,
  SOVEREIGN-PROVISIONING + BLUEPRINT-AUTHORING → RUNBOOKS.md

CLAUDE.md NOT touched. Source files NOT deleted (orchestrator owns deletion).
No push, no PR. Manifest at /tmp/merge-D-runbooks-status-glossary-manifest.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: assemble lean doc strategy — delete legacy sources, move ledger/sessions/archive, ADR-0004, rewrite cross-refs

Per founder direction 2026-05-20 + user-global ~/.claude/CLAUDE.md §11.

This is the orchestrator commit on top of the four cherry-picked consolidation
commits (ARCHITECTURE, PRINCIPLES, DOD, RUNBOOKS+STATUS+GLOSSARY). It:

1. Deletes 15 legacy source docs (now folded into the 7 canonical):
   PLATFORM-TECH-STACK, NAMING-CONVENTION, EPICS-1-6-unified-design,
   BOOTSTRAP-KIT-EXPANSION-PLAN, INVIOLABLE-PRINCIPLES, ANTI-PATTERN-CATALOG,
   5-PILLAR-DOD, DOMAINS-CANON, SOVEREIGN-MULTI-REGION-DOD,
   PERSONAS-AND-JOURNEYS, BLUEPRINT-AUTHORING, CHART-AUTHORING,
   DEMO-RUNBOOK, RUNBOOK-OPERATIONS, RUNBOOK-PROVISIONING.

2. Moves transient + historical docs into proper subdirs:
   - docs/ledger/{TRUST,TRACKER}.md (cron-refreshed live state)
   - docs/sessions/{2026-05-17-convergence,2026-05-19-20-trust-recovery,
     2026-05-20-trust-audit,2026-05-20-walk-runbook}.md
   - docs/archive/{validation-log,orchestrator-state,omantel-handover-wbs}.md

3. Adds docs/adr/0004-cnpg-sync-replication.md (Pillar 3 zero-tx-loss decision)
   + docs/adr/README.md index.

4. Updates CLAUDE.md reading-order + repo-structure block to match the
   lean strategy and current core/ tree (controllers/, marketplace/, etc.).

5. Sweeps all .md files + .github/workflows + scripts to repoint old doc
   paths to the new canonical homes. ADR cross-references kept intact
   (ADRs are immutable historical artifacts).

Operator-side cron scripts that still write to the old paths
(/home/openova/bin/refresh-dod-dashboard.sh, refresh-wbs.sh and
openova-private/bin/trust-audit.sh) need a one-line path update —
flagged in the PR body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(bootstrap-kit): update repo-root sentinel to docs/PRINCIPLES.md

The bootstrap-kit Go test used `docs/INVIOLABLE-PRINCIPLES.md` as its
repo-root sentinel; the file no longer exists after the lean-doc
consolidation (it's now `docs/PRINCIPLES.md`). Update the walker to
match the new canonical filename.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 14:40:01 +04:00

16 KiB
Raw Blame History

Scope of this file: repository structure, Catalyst terminology, OpenOva-platform-specific rules, and per-component dev workflow specific to this monorepo.

Generic engineering principles for active developer sessions — anti-theater discipline, sub-agent dispatch rules, GitHub disciplines, TBD-V## ticketing, microservice patterns — live in user-global ~/.claude/CLAUDE.md (auto-loaded by Claude Code in every session).

OpenOva-platform specifics — the 5-pillar Definition of Done, the Phase 0 / 1 / 2 deterministic test, domain canon, the anti-pattern catalog, bp-self-sovereign-cutover, and openova-sandbox-mcp auto-mount — live in docs/ of this repo, consolidated under the lean doc strategy into 7 canonical documents + 3 subdirs (per user-global ~/.claude/CLAUDE.md §11). External readers without the user-global file can rely on:


OpenOva (Public Repo) — Codebase Guide for Claude

This is the public, open-source OpenOva repository. It hosts the Catalyst platform code and Blueprint catalog.

Proprietary content (website source, deployment configs, infra secrets, the running clusters' manifests) lives in openova-private.


Lean documentation strategy

Per founder direction 2026-05-20 + user-global ~/.claude/CLAUDE.md §11, this repo's docs are consolidated into 7 canonical files + 3 subdirs:

  • 7 canonical docs (the only source of truth): GLOSSARY.md, STATUS.md, ARCHITECTURE.md, DOD.md, PRINCIPLES.md, RUNBOOKS.md, SECURITY.md.
  • docs/adr/ — immutable Architecture Decision Records (numbered, additive-only).
  • docs/ledger/ — cron-refreshed live state (TRUST.md, TRACKER.md).
  • docs/sessions/ — date-stamped transient session reports + walk runbooks.
  • docs/archive/ — historical / superseded / one-off documents.

Per-chart DESIGN.md files inside platform/<x>/ and products/<x>/charts/<chart>/ stay co-located with their Blueprint code — they are not platform-level docs.

Read these before doing anything

In order:

  1. docs/GLOSSARY.md — terminology + banned terms. Wins over any other doc.
  2. docs/STATUS.md — what's built today vs what's design. Read before claiming any feature exists.
  3. docs/ARCHITECTURE.md — Catalyst target architecture (incl. naming, stack, EPICs, bootstrap-kit slots).
  4. docs/DOD.md — the 5-pillar + Multi-Region Definition of Done, domains canon, personas/journeys. Every dispatch must move at least one pillar.
  5. docs/PRINCIPLES.md — the 15 inviolable engineering principles + anti-pattern catalog.
  6. docs/RUNBOOKS.md — Blueprint authoring, chart authoring, demo / operations / provisioning runbooks.
  7. docs/SECURITY.md — security posture + threat model.

Plus subdirs:

  • docs/adr/ — Architecture Decision Records (start at README.md index).
  • docs/ledger/TRUST.md (per-surface verification ledger) + TRACKER.md (open work).
  • docs/sessions/ — date-stamped walk runbooks and session reports.
  • docs/archive/ — historical / superseded.

These define the model + implementation reality + the rules of engagement. Any contradiction in older docs is to be treated as outdated and updated to match these.


Platform-specific rules (OpenOva-only)

These rules are specific to the OpenOva platform and supplement the generic engineering rules in user-global ~/.claude/CLAUDE.md.

Definition of Done — 5-pillar end-user contract

Every dispatch must advance at least one of the 5 inseparable pillars or one deterministic step in Phase 0 / 1 / 2 of docs/DOD.md:

  1. Marketplace + voucher onboarding (Phase 0 + Phase 1 ac)
  2. Multi-region BCP topology choice at signup (Phase 1 b)
  3. Two independent CNPG clusters + region-kill failover (Phase 1 b + orthogonal D31)
  4. Sandbox + auto-mounted openova-sandbox-mcp with full org knowledge (Phase 2 ae)
  5. Sovereign independence post-bp-self-sovereign-cutover (Principle #11 + ADR-0002)

Operator-console polish, cosmetic-guard re-enables, treemap drill-down quality, jobs region filter, admin sidebar nav — none of these are pillar work. They are tertiary operator-debugger surfaces. Never let them displace pillar work.

A pillar is shipped when an operator walks a fresh prov through the pillar-relevant steps and produces a screenshot + non-empty wire-capture + working downstream artifact. PR merge ≠ pillar shipped.

Domains canon — never openova.io in tests

Test provs and tenant Organizations use the domains listed in docs/DOD.md §Domains-canon:

  • Test Sovereign: t<NN>.omani.works (or t<NN>.omantel.biz if LE-rate-limited)
  • Tenant Organization: <orgslug>.omani.homes (default), omani.rest, or omani.trade
  • Voucher redeem URL: https://marketplace.t<NN>.omani.works/redeem/?code=<CODE>

Forbidden in tests: openova.io, omantel.openova.io, Nova Cloud, eventforge.io. The legacy admin.<sovereign-fqdn> subdomain for voucher operations is dead — voucher and billing operations live in the operator console's BSS menu.

Anti-theater discipline during PR review

Per docs/PRINCIPLES.md §Anti-pattern-catalog, defensive-coding patterns are not approval — they are clues to investigate. Red flags to hunt:

  • Null-guards on empty data (PR #1185 shape)
  • enabled: false defaults on features the deterministic test asserts present (PR #1138 shape)
  • Click handlers missing on leaf cells (PR #1085 shape)
  • Closes #N on a scaffold-only PR with no operator-visible behavior change (PR #1918 shape)
  • kubectl --dry-run=server against a running cluster as the only validator (PR #1933 shape)
  • Multi-region claim on a single-region prov (PR #1599 shape)
  • must_contain token-passing tests (PR #1362/#1366/#1371/#1378 shape)
  • Python jsonencode() simulation passed off as tofu validate (PR #1892 shape)

Refs #N is the default in PR bodies, not Closes #N. Auto-close on PR merge is the enemy. The issue closes only after the operator-walk-with-screenshot lands as a comment on the issue itself.

Sovereignty cutover — bp-self-sovereign-cutover

A franchised Sovereign is tethered to the OpenOva mothership in 8 places (full list in docs/DOD.md §Pillar 5 and docs/adr/0002-post-handover-sovereignty-cutover.md). bp-self-sovereign-cutover installs dormant at bootstrap-kit slot 06a during Phase 1 and runs eight sequential Jobs post-handover that pivot all 8 tethers. The final step is a 10-minute deny-egress NetworkPolicy hold against github.com, ghcr.io, and harbor.openova.io. cutoverComplete=true is set only if the cluster reconciles green during this hold. No cutover claim without the egress-block proof.

Customer-sync — Gitea mirroring

Each Sovereign's Gitea mirrors the public catalog from this repo on the operator's chosen schedule (default daily; air-gapped Sovereigns mirror via offline media). See §Customer Sync below for the mapping. After cutover, every Flux reconcile pulls exclusively from the local Gitea + Harbor.

Verification ledger — docs/ledger/TRUST.md

Every claimed-done surface lives in docs/ledger/TRUST.md in one of four states: UNVERIFIED (default), VERIFIED-PASS, VERIFIED-FAIL, VERIFIED-PARTIAL. Every PR against a surface flips it back to UNVERIFIED until re-walked. Verification agents are READ-ONLY — they may not ship PRs to make their own walks pass.

The companion live ledger of open work is docs/ledger/TRACKER.md. Both files are cron-refreshed.


What Catalyst is

OpenOva (the company) builds Catalyst (the platform). A deployed Catalyst is called a Sovereign. A Sovereign hosts Organizations, which contain Environments, which run Applications, which are installed from Blueprints.

openova is a Sovereign run by us (formerly Nova). omantel is a Sovereign run by Omantel for SMEs. bankdhofar is a Sovereign run by the bank for itself. Same code in every Sovereign.


Repo structure

openova/
├── core/                   # Catalyst control-plane application (Go)
│   ├── cmd/                # entry points (main.go per binary)
│   ├── admin/              # admin tooling
│   ├── console/            # operator console (Astro + Svelte) — UI
│   ├── controllers/        # CRD reconcilers: application, blueprint, continuum,
│   │                       # environment, organization, sandbox, useraccess
│   ├── marketplace/        # marketplace projector
│   ├── marketplace-api/    # marketplace REST API
│   ├── pool-domain-manager/# subdomain-pool reconciler (.omani.* etc.)
│   ├── pkg/                # shared Go packages (e.g. dynadot-client)
│   └── services/           # per-microservice scaffolding
├── platform/               # Component Blueprint folders — one folder per upstream OSS project
│   ├── cilium/  cnpg/  flux/  gitea/  keycloak/  openbao/  ...
│   └── ...                 # ~56 folders; some chart-bearing, others README-only
├── products/               # Composite Blueprint folders OpenOva ships
│   ├── catalyst/           # bp-catalyst-platform umbrella + bp-* sub-charts
│   ├── cortex/             # AI Hub                          (scaffold)
│   ├── axon/               # SaaS LLM Gateway                (real code: chart/ src/ scripts/)
│   ├── fingate/            # Open Banking                    (scaffold)
│   ├── fabric/             # Data & Integration              (scaffold)
│   └── relay/              # Communication                   (scaffold)
└── docs/                   # Canonical platform documentation (lean strategy — see above)
    ├── adr/                # Architecture Decision Records (immutable, numbered)
    ├── ledger/             # TRUST.md + TRACKER.md (cron-refreshed)
    ├── sessions/           # date-stamped walk runbooks + session reports
    ├── archive/            # historical / superseded
    └── proposals/  runbooks/  lessons-learned/   # legacy subdirs; migrating into the 7 canonical docs

For the up-to-date "what's actually built today" inventory (controllers green/yellow/red, microservices status, CRD set) see docs/STATUS.md.

Each subfolder of platform/ and products/ is the source of one Blueprint in this monorepo (canonical layout). CI fans out to per-Blueprint OCI artifacts at ghcr.io/openova-io/bp-<name>:<semver> — that's where per-Blueprint isolation lives. There are no separate per-Blueprint Git repositories.


Naming conventions in this repo

  • Cluster: {prov}-{reg}-{bb}-{env_type} — e.g. hz-fsn-rtz-prod
  • vcluster: {org} (within a cluster) — e.g. acme
  • Catalyst Environment: {org}-{env_type} — e.g. acme-prod
  • Blueprint: bp-<name> — e.g. bp-wordpress
  • Application: <purpose> (within an Environment) — e.g. marketing-site

Full table in docs/ARCHITECTURE.md §4 (Naming).


Banned terms

The single canonical list of banned terms (with corrections + rationale) lives in docs/GLOSSARY.md §Banned-terms. Do not duplicate it here.

Highlights: "tenant" → Organization; "operator" (as a person) → sovereign-admin; "client" (product UX) → User; "module"/"template" (in Catalyst sense) → Blueprint; "Backstage" → Catalyst console; "Synapse" (the OpenOva product) → Axon; "Workspace" → Environment; "Instance" (user-facing) → Application.

When in doubt: defer to docs/GLOSSARY.md.


Commit conventions

  • Conventional commits: feat:, fix:, docs:, chore:, refactor:.
  • Sign every commit. Default identity for this repo: hatiyildiz (269457768+hatiyildiz@users.noreply.github.com). Switch to alierenbaysal (269455083+alierenbaysal@users.noreply.github.com) only when the user explicitly directs.
  • No git config global; pass -c user.name=… -c user.email=… per commit.
  • Reference issues/PRs by number where applicable.
  • Per ~/.claude/CLAUDE.md: every issue lifecycles through status/in-progressstatus/uatstatus/completed. Open an issue before code changes; never close it (only the user does).

What's user-facing (don't expand without permission)

The user-facing surfaces are UI / Git / API only. There is no Terraform provider, no Pulumi SDK, no catalystctl install for production changes. Crossplane is platform plumbing, never a user surface.

If a future feature seems to need another surface, it almost certainly belongs as either (a) UI work, (b) Blueprint work, or (c) a Crossplane Composition the user never sees. Reject the impulse to add a fourth surface.


Component README rule of thumb

Every platform/<x>/README.md and products/<x>/README.md:

  1. States what the component is (one line).
  2. States its role in Catalyst (control plane vs Application Blueprint vs both).
  3. Links to the canonical Catalyst doc that defines its place in the model.
  4. Configuration knobs and Blueprint configSchema highlights.
  5. Operational notes — backups, scaling, multi-region behavior.

If a README contradicts docs/ARCHITECTURE.md or docs/GLOSSARY.md, the canonical doc wins; update the README.


Customer Sync

Each Sovereign's Gitea mirrors the public catalog from this repo:

GitHub (this repo)                  Per-Sovereign Gitea (mirrored)
──────────────────                  ──────────────────────────────
openova/platform/cilium/   ──sync──> gitea.<location-code>.<sovereign-domain>/catalog/bp-cilium/
openova/products/cortex/   ──sync──> gitea.<location-code>.<sovereign-domain>/catalog/bp-cortex/
...

(Per NAMING §5.1 the Catalyst control-plane DNS pattern is {component}.{location-code}.{sovereign-domain} — e.g. gitea.hfmp.openova.io.)

Sovereigns pull on their own schedule (default daily). Air-gapped Sovereigns mirror via offline media.


Per-component dev workflow

Most components are simple: a README.md, a Helm chart or Kustomize base, a blueprint.yaml, and a CI pipeline. Iteration is:

cd platform/<component>/
# edit chart/, manifests/, blueprint.yaml
# CI validates and dry-runs on push
# tagged release → OCI publish + signature → blueprint-controller picks up

For Catalyst control-plane code (core/):

cd core/
go test ./...
go build ./apps/...
# UI in core/ui/: npm install, npm run dev

CRD types live in core/pkg/apis/. Add new types here, regenerate clients, then update the controller in core/internal/.