History

e3mrah f6757c7c93 feat(docs): lean documentation strategy — consolidate 16 docs into 7 canonical + 3 subdirs (#2094 ) * docs(arch): consolidate ARCHITECTURE + PLATFORM-TECH-STACK + NAMING + EPICS-1-6 + BOOTSTRAP-KIT-EXPANSION → docs/ARCHITECTURE.md (lean doc strategy) Single canonical "how OpenOva works" doc per founder's lean-doc strategy. 2926 source lines → 1110 consolidated lines, no semantic loss. Sections: §1 High-level model (Catalyst/Sovereign/Org/Env/Application/Blueprint) §2 Repo layout §3 Tech stack by layer (CNI/GitOps/IaC/event-spine/data/secrets/identity/...) §4 Naming conventions (dimensions, patterns, labels, DOMAINS-CANON) §5 Catalyst control plane (rules, CRDs, controllers, cutover, identity, surfaces) §6 Per-host-cluster infrastructure §7 Application Blueprints §8 Multi-region topology (1 cpx52/region, WireGuard-over-public-IPs, ClusterMesh) §9 Bootstrap-kit slot ordering (full 48-slot canonical list) §10 EPIC-level design overview (EPIC-0 through EPIC-6) §11 Per-chart DESIGN.md inventory §12 OAM influence §13 Read further Stale literal fixes: - omantel.openova.io → omantel.biz / <sovereign>.<tld> / t38.omani.works (7 instances) - SPIRE marked DEFERRED / opt-in only (PR #665, TBD-V29 #2055) - failover-controller marked REPLACED by bp-continuum New PR refs wired into §3: - PR #665 SPIRE deferral - PR #2071 bp-cnpg-pair synchronous remote_apply (zero-tx-loss multi-region) - PR #2087 bp-cnpg-pair pre-merge guard - PR #2093 bp-cnpg-pair pre-merge guard New stack components added to §3: - bp-cnpg-pair (synchronous remote_apply ReplicaCluster across ClusterMesh) - bp-continuum (lease-based failover orchestrator) - bp-self-sovereign-cutover (8-tether pivot, ADR-0002, Principle #11) Source docs (to be deleted by orchestrator in final PR): - docs/PLATFORM-TECH-STACK.md - docs/NAMING-CONVENTION.md - docs/EPICS-1-6-unified-design.md - docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md * docs(principles): consolidate INVIOLABLE-PRINCIPLES + ANTI-PATTERN-CATALOG → docs/PRINCIPLES.md (lean doc strategy) * docs(dod): consolidate 5-PILLAR-DOD + DOMAINS-CANON + SOVEREIGN-MULTI-REGION-DOD + PERSONAS-AND-JOURNEYS → docs/DOD.md (lean doc strategy) * docs(runbooks+status+glossary): consolidate 5 runbooks → RUNBOOKS.md + refresh STATUS.md + fold banned-terms into GLOSSARY.md (lean doc strategy) Part 1 — Runbook consolidation: - NEW docs/RUNBOOKS.md with 7 numbered sections (provisioning, day-2 ops, Blueprint authoring, chart conventions, demo walk, failover, troubleshooting) - Folds BLUEPRINT-AUTHORING / CHART-AUTHORING / DEMO-RUNBOOK / RUNBOOK-OPERATIONS / RUNBOOK-PROVISIONING into one canonical surface - Documents dual-annotation requirement for charts with enabled.default: false (GUARD 1 #2087 no-upstream + GUARD 2 #2093 smoke-render) with bp-network-policies:1.0.1 dead-reserve incident as the live evidence - All admin.<fqdn> legacy URL refs → console.<fqdn>/bss (BSS lives in operator console) - All openova.io / omantel.omani.works test commands → canonical t<NN>.omani.works - Cites PRs #2076 (docs migration), #2082 (no-auto-close-keyword), #2087, #2093 Part 2 — STATUS.md refresh (renamed from IMPLEMENTATION-STATUS.md): - Header dated 2026-05-20 (was 2026-04-29; 22 days stale per audit) - Adds 🟦 CODE-COMPLETE state for "controllers + CRDs + tests landed, awaiting fresh-prov walk" (per 5-pillar DoD) - Pillar 3 marked CODE-COMPLETE (PRs #2071/#2072/#2073/#2074/#2075/#2053) - Adds 3 new CRDs verified in products/catalyst/chart/crds/: CNPGPair, PDM, Sandbox - Sandbox controller chain CODE-COMPLETE (PRs #1615/#1618/#1621/#1622/#1626/#1631/#1632) - SPIRE marked DEFERRED — opt-in only (PRs #665, #2056, #2061) - New §6 CI / supply-chain guards table: hollow-chart (#2087), smoke-render (#2093), no-auto-close-keyword (#2082), observability-toggle, subchart 4-step, Flux version-pin replay - New §9 Pillar-status table — Pillars 1/2/3/4 CODE-COMPLETE, Pillar 5 🚧 - Pillar 1 (PRs #2038 V18, #2043 V18-D), Pillar 2 (PR #2029 V20), Pillar 3 (per above), Pillar 4 (Sandbox chain) Part 3 — GLOSSARY.md folded as single source of truth for banned terms: - Header dated 2026-05-20, notes "single source of truth for banned terms" and "no separate BANNED-TERMS.md" - Existing 11 banned-terms rows rewritten with italicized qualifiers - NEW Forbidden test domains subsection: openova.io (mothership-only), omantel.openova.io (hallucinated), Nova Cloud (predecessor brand), eventforge.io (hallucinated), admin.<fqdn> (dead BSS URL) - SPIFFE/SPIRE identity row + acronym row marked deferred per PR #665 with TBD-V29 (#2055) re-introduction roadmap - Cross-links updated: IMPLEMENTATION-STATUS → STATUS, SOVEREIGN-PROVISIONING + BLUEPRINT-AUTHORING → RUNBOOKS.md CLAUDE.md NOT touched. Source files NOT deleted (orchestrator owns deletion). No push, no PR. Manifest at /tmp/merge-D-runbooks-status-glossary-manifest.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: assemble lean doc strategy — delete legacy sources, move ledger/sessions/archive, ADR-0004, rewrite cross-refs Per founder direction 2026-05-20 + user-global ~/.claude/CLAUDE.md §11. This is the orchestrator commit on top of the four cherry-picked consolidation commits (ARCHITECTURE, PRINCIPLES, DOD, RUNBOOKS+STATUS+GLOSSARY). It: 1. Deletes 15 legacy source docs (now folded into the 7 canonical): PLATFORM-TECH-STACK, NAMING-CONVENTION, EPICS-1-6-unified-design, BOOTSTRAP-KIT-EXPANSION-PLAN, INVIOLABLE-PRINCIPLES, ANTI-PATTERN-CATALOG, 5-PILLAR-DOD, DOMAINS-CANON, SOVEREIGN-MULTI-REGION-DOD, PERSONAS-AND-JOURNEYS, BLUEPRINT-AUTHORING, CHART-AUTHORING, DEMO-RUNBOOK, RUNBOOK-OPERATIONS, RUNBOOK-PROVISIONING. 2. Moves transient + historical docs into proper subdirs: - docs/ledger/{TRUST,TRACKER}.md (cron-refreshed live state) - docs/sessions/{2026-05-17-convergence,2026-05-19-20-trust-recovery, 2026-05-20-trust-audit,2026-05-20-walk-runbook}.md - docs/archive/{validation-log,orchestrator-state,omantel-handover-wbs}.md 3. Adds docs/adr/0004-cnpg-sync-replication.md (Pillar 3 zero-tx-loss decision) + docs/adr/README.md index. 4. Updates CLAUDE.md reading-order + repo-structure block to match the lean strategy and current core/ tree (controllers/, marketplace/, etc.). 5. Sweeps all .md files + .github/workflows + scripts to repoint old doc paths to the new canonical homes. ADR cross-references kept intact (ADRs are immutable historical artifacts). Operator-side cron scripts that still write to the old paths (/home/openova/bin/refresh-dod-dashboard.sh, refresh-wbs.sh and openova-private/bin/trust-audit.sh) need a one-line path update — flagged in the PR body. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(bootstrap-kit): update repo-root sentinel to docs/PRINCIPLES.md The bootstrap-kit Go test used `docs/INVIOLABLE-PRINCIPLES.md` as its repo-root sentinel; the file no longer exists after the lean-doc consolidation (it's now `docs/PRINCIPLES.md`). Update the walker to match the new canonical filename. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-20 14:40:01 +04:00
..
chart	feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271 ) (#288 )	2026-04-30 19:37:19 +04:00
blueprint.yaml	feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271 ) (#288 )	2026-04-30 19:37:19 +04:00
README.md	feat(docs): lean documentation strategy — consolidate 16 docs into 7 canonical + 3 subdirs (#2094 )	2026-05-20 14:40:01 +04:00

README.md

LLM Gateway

Subscription-based proxy for LLM access via Claude Code. Application Blueprint (see docs/ARCHITECTURE.md §4.6). Catalyst's outbound LLM access point — routes between Claude API, GPT-4 API, self-hosted vLLM, and Axon (the SaaS gateway). Used by bp-cortex.

Status: Accepted | Updated: 2026-04-27

Overview

LLM Gateway enables users with Claude/OpenAI subscriptions to use Claude Code with internal models without requiring API pay-as-you-go billing.

flowchart LR
    subgraph Gateway["LLM Gateway"]
        Auth[Subscription Auth]
        Quota[Usage Quota]
        Router[Model Router]
    end

    subgraph Backends["LLM Backends"]
        Internal[Internal LLM<br/>vLLM/KServe]
        Claude[Claude API]
        OpenAI[OpenAI API]
    end

    User[Claude Code User] -->|Subscription Token| Gateway
    Gateway --> Auth
    Auth --> Quota
    Quota --> Router
    Router --> Backends

Why LLM Gateway?

Feature	Benefit
Subscription-based	No API billing required
Quota management	Fair usage limits
Model routing	Internal + external models
Claude Code support	Native integration

How It Works

User authenticates with their subscription credentials
Gateway validates subscription status
Requests are routed to internal or external LLMs
Usage is tracked against subscription quota

Configuration

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-gateway
  namespace: ai-hub
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: gateway
          image: harbor.<location-code>.<sovereign-domain>/ai-hub/llm-gateway:latest
          ports:
            - containerPort: 8000
          env:
            - name: INTERNAL_LLM_URL
              value: "http://vllm.ai-hub.svc:8000/v1"
            - name: CLAUDE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: gateway-secrets
                  key: claude-api-key
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: gateway-secrets
                  key: openai-api-key
            - name: QUOTA_REDIS_URL
              value: "redis://valkey.ai-hub.svc:6379"
            - name: AUTH_PROVIDER
              value: "keycloak"
            - name: KEYCLOAK_URL
              value: "https://keycloak.<location-code>.<sovereign-domain>/realms/<org>"

Subscription Tiers

Tier	Daily Quota	Models
Free	10 requests	Internal only
Pro	1,000 requests	Internal + Claude Haiku
Enterprise	Unlimited	All models

Authentication Flow

sequenceDiagram
    participant User
    participant ClaudeCode
    participant Gateway
    participant Keycloak
    participant LLM

    User->>ClaudeCode: Configure gateway URL
    ClaudeCode->>Gateway: Request + Token
    Gateway->>Keycloak: Validate subscription
    Keycloak-->>Gateway: Subscription tier
    Gateway->>Gateway: Check quota
    alt Quota available
        Gateway->>LLM: Forward request
        LLM-->>Gateway: Response
        Gateway->>Gateway: Decrement quota
        Gateway-->>ClaudeCode: Response
    else Quota exceeded
        Gateway-->>ClaudeCode: 429 Rate Limited
    end

Model Routing

# Model routing logic
def route_model(request_model: str, tier: str) -> str:
    routing = {
        "free": {
            "claude-3-opus": "qwen3-32b",  # Route to internal
            "claude-3-sonnet": "qwen3-32b",
            "gpt-4": "qwen3-32b"
        },
        "pro": {
            "claude-3-opus": "claude-3-haiku",  # Downgrade
            "claude-3-sonnet": "claude-3-haiku",
            "claude-3-haiku": "claude-3-haiku",  # Pass through
            "gpt-4": "qwen3-32b"
        },
        "enterprise": {
            # Pass through all models
        }
    }
    return routing.get(tier, {}).get(request_model, request_model)

Quota Management

# Redis-based quota tracking
async def check_quota(user_id: str, tier: str) -> bool:
    key = f"quota:{user_id}:{today()}"
    current = await redis.get(key) or 0

    limits = {"free": 10, "pro": 1000, "enterprise": float("inf")}

    if int(current) >= limits[tier]:
        return False

    await redis.incr(key)
    await redis.expire(key, 86400)  # Reset daily
    return True

Claude Code Setup

# Configure Claude Code to use gateway
export ANTHROPIC_API_KEY="your-subscription-token"
export ANTHROPIC_BASE_URL="https://llm-gateway.<env>.<sovereign-domain>/v1"

# Or in claude code config
claude config set api_base "https://llm-gateway.<env>.<sovereign-domain>/v1"
claude config set api_key "your-subscription-token"

API Endpoints

Endpoint	Purpose
`/v1/messages`	Anthropic-compatible chat
`/v1/chat/completions`	OpenAI-compatible chat
`/v1/models`	List available models
`/quota`	Check remaining quota
`/health`	Health check

Monitoring

Metric	Query
Requests by tier	`gateway_requests_total{tier}`
Quota usage	`gateway_quota_used{user}`
Model routing	`gateway_model_routes_total{from, to}`
Latency	`gateway_request_duration_seconds`

Consequences

Positive:

No API billing for users
Subscription-based access
Quota management
Model routing flexibility
Claude Code compatible

Negative:

Additional infrastructure
Quota management complexity
Subscription validation overhead

Part of OpenOva