hatiyildiz 7cafa3c894 docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay

Component-level architectural correction (two changes):

1. MinIO → SeaweedFS as unified S3 encapsulation layer

The old design used MinIO for in-cluster S3 plus separate cold-tier configuration scattered across consumers. The new design positions SeaweedFS as the single S3 encapsulation layer: every Catalyst component talks to one endpoint (seaweedfs.storage.svc:8333). SeaweedFS internally handles hot tier (in-cluster NVMe), warm tier (in-cluster bulk), and cold tier (transparent passthrough to cloud archival storage — Cloudflare R2 / AWS S3 / Hetzner Object Storage / etc., chosen at Sovereign provisioning). One audit/lifecycle/encryption boundary instead of N. No Catalyst component talks to cloud S3 directly anymore — Velero, CNPG WAL archive, OpenSearch snapshots, Loki/Mimir/Tempo, Iceberg, Harbor blob store, Application buckets all share one S3 surface.

2. Apache Guacamole added as Application Blueprint §4.5 Communication

Clientless browser-based RDP/VNC/SSH/kubectl-exec gateway. Keycloak SSO, full session recording to SeaweedFS for compliance evidence (PSD2/DORA/SOX). Composed into bp-relay. Replaces VPN+native-client distribution for auditable remote access.

Component changes:
- DELETED: platform/minio/
- CREATED: platform/seaweedfs/README.md (unified S3 + cold-tier encapsulation; bucket layout; multi-region replication via shared cold backend; migration-from-MinIO section)
- CREATED: platform/guacamole/README.md (clientless remote-desktop gateway; GuacamoleConnection CRD; compliance integration via session recordings)

Doc updates: PLATFORM-TECH-STACK §1+§3.5+§4.5+§5+§7.4; TECHNOLOGY-FORECAST L11+mandatory+a-la-carte counts (52 → 53); ARCHITECTURE §3 topology; SECURITY §4 DB engines; SOVEREIGN-PROVISIONING §1 inputs; SRE §2.5+§7; IMPLEMENTATION-STATUS §3; BLUEPRINT-AUTHORING stateful examples; BUSINESS-STRATEGY 13 component-count anchors + Relay product line; README.md backup row; CLAUDE.md folder count.

Component README updates (S3 endpoint + dependency renames): cnpg, clickhouse, flink, gitea, iceberg, harbor, grafana, livekit, kserve, milvus, opensearch, flux, stalwart, velero (substantive rewrite of velero — now writes exclusively to SeaweedFS with cold-tier auto-routing). Products: relay, fabric.

UI scaffold: products/catalyst/bootstrap/ui/src/shared/constants/components.ts — minio entry replaced with seaweedfs; velero+harbor deps updated; new guacamole entry added.

VALIDATION-LOG entry "Pass 104 — MinIO → SeaweedFS swap + Guacamole add" captures the encapsulation principle and adds Lesson #22: storage tier policy belongs at the encapsulation boundary, not inside every consumer.

Verification: zero remaining MinIO references in canonical docs (one intentional retention in TECHNOLOGY-FORECAST L37 explaining the swap); 53 platform/ folders matching all "53 components" anchors; bp-relay composition includes guacamole.

2026-04-28 10:23:46 +02:00

7.0 KiB

Raw Blame History

Harbor

Container registry with vulnerability scanning. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.5) — every host cluster runs a Harbor instance for Catalyst component images, mirrored Blueprint OCI artifacts, and customer images.

Status: Accepted | Updated: 2026-04-27

Overview

Harbor is mandatory on every host cluster. Each host cluster runs its own Harbor instance that mirrors from upstream sources (ghcr.io/openova-io/... for Catalyst components and Blueprint OCI artifacts; the customer's own CI for application images). Local Harbor = fast Pod pulls, no cross-region traffic on every image pull, air-gap ready.

flowchart TB
    subgraph Upstream["Upstream OCI sources"]
        GHCR[ghcr.io/openova-io/* — Catalyst + Blueprints]
        CustCI[Customer CI — Application images]
    end

    subgraph Cluster1["Host cluster A (e.g. hz-fsn-rtz-prod)"]
        H1[Harbor — local mirror]
        T1[Trivy Scanner]
        Pods1[Pods pull locally]
    end

    subgraph Cluster2["Host cluster B (e.g. hz-hel-rtz-prod)"]
        H2[Harbor — local mirror]
        T2[Trivy Scanner]
        Pods2[Pods pull locally]
    end

    GHCR -.->|"pull mirror"| H1
    CustCI -.->|"push"| H1
    GHCR -.->|"pull mirror"| H2
    CustCI -.->|"push"| H2
    H1 --> T1
    H2 --> T2
    H1 --> Pods1
    H2 --> Pods2

Why Mandatory?

Requirement	Harbor (per host cluster)	External Registry
Local pulls (no cross-region traffic)	✅ Each cluster's Pods pull from local Harbor	❌ Pods pull cross-region
Vulnerability scanning	✅ Trivy integrated	⚠️ Depends on provider
Air-gap support	✅ Self-hosted	❌
RBAC	✅ Full control	⚠️ Provider-specific
Audit logging	✅ Complete	⚠️ Limited
No external dependency at runtime	✅ Once mirrored	❌

Features

Feature	Support
Image storage	OCI-compliant
Vulnerability scanning	Trivy integration
Image signing	Cosign/Notary
Replication	Push/pull between regions
RBAC	Project-based access
Quotas	Per-project storage limits
Garbage collection	Automatic cleanup

Per-host-cluster mirroring (NOT primary-replica)

Catalyst's agreed model is one Harbor per host cluster, each independently pulling from upstream OCI sources. There is no Harbor-to-Harbor replication primary/replica.

sequenceDiagram
    participant CI as CI / Upstream OCI
    participant H1 as Harbor (cluster A)
    participant T1 as Trivy (cluster A)
    participant H2 as Harbor (cluster B)
    participant T2 as Trivy (cluster B)
    participant Pods as Pods

    CI->>H1: pull-mirror sync (configured per project)
    H1->>T1: scan on ingest
    CI->>H2: pull-mirror sync (independent of H1)
    H2->>T2: scan on ingest
    Pods->>H1: pull (cluster A Pods)
    Pods->>H2: pull (cluster B Pods)

Why pull-mirror, not Harbor-to-Harbor replication:

Single source of truth = upstream (ghcr.io/openova-io/... or customer CI), not a "primary Harbor".
Each cluster is its own failure domain — primary-replica drift between Harbors would be one more thing to fail.
Air-gap path is the same shape: a one-time mirror import vs ongoing primary-pushed replication.

Benefits:

Images available locally in each cluster.
Survives any cluster (including the management cluster) going down — workload clusters keep pulling locally.
Faster pulls (no cross-region traffic per Pod start).

Storage Backend Options

Backend	Use Case	Notes
PVC	Small deployments	Local storage
S3 (SeaweedFS)	Production	Recommended - tiered archiving
Cloud S3	Managed	AWS S3 / GCS / Azure Blob

Recommended: S3 via SeaweedFS

flowchart LR
    Harbor[Harbor] -->|"S3 API"| SeaweedFS[SeaweedFS]
    SeaweedFS -->|"Tier cold data"| Archive[Archival S3]

Configuration

Helm Values

expose:
  type: ingress
  ingress:
    className: cilium
    hosts:
      core: harbor.<location-code>.<sovereign-domain>
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod

# S3 Storage (SeaweedFS)
persistence:
  imageChartStorage:
    type: s3
    s3:
      region: us-east-1
      bucket: harbor-registry
      accesskey: ""  # From ESO secret
      secretkey: ""  # From ESO secret
      regionendpoint: http://seaweedfs.storage.svc:8333
      v4auth: true

trivy:
  enabled: true

database:
  type: internal  # or external for CNPG

redis:
  type: internal  # or external for Valkey

core:
  secretName: harbor-core-secret

Pull-mirror policy

{
  "name": "ghcr-openova-mirror",
  "src_registry": {
    "type": "harbor",
    "url": "https://ghcr.io",
    "credential": {
      "access_key": "",
      "access_secret": ""
    }
  },
  "trigger": {
    "type": "scheduled",
    "trigger_settings": {
      "cron": "0 */6 * * *"
    }
  },
  "filters": [
    {
      "type": "name",
      "value": "openova-io/**"
    }
  ],
  "enabled": true
}

Security Scanning

Trivy Integration

Scan Type	Trigger
On push	Automatic when image pushed
Scheduled	Daily full scan
Manual	On-demand via UI/API

Scan Policy

Severity	Action
Critical	Block pull
High	Allow (configurable)
Medium	Allow
Low	Allow

Kyverno Policies

Require Harbor Images

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-harbor-images
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-harbor-registry
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Images must be pulled from Harbor registry"
        pattern:
          spec:
            containers:
              - image: "harbor.<location-code>.<sovereign-domain>/*"

Resource Requirements

Component	CPU	Memory
Harbor Core	0.5	512Mi
Registry	0.5	512Mi
Database	0.5	512Mi
Redis	0.25	256Mi
Trivy	0.5	1Gi
Total	2.25	2.75Gi

Backup Strategy

Harbor data backed up via Velero to Archival S3:

flowchart LR
    Harbor[Harbor] --> Velero[Velero]
    Velero --> S3[Archival S3]

Backed up:

Database (PostgreSQL)
Registry storage (blobs)
Configuration

Consequences

Positive:

Complete control over image lifecycle.
Built-in vulnerability scanning (Trivy on ingest).
Per-cluster mirror = no cross-region pull traffic; each cluster is an independent failure domain.
Air-gap ready (one-time import works the same way as ongoing pull-mirror).
Audit trail for compliance.

Negative:

Resource overhead (~3GB RAM)
Operational responsibility
Backup requirements (handled by Velero)

Part of OpenOva

7.0 KiB Raw Blame History