openova/platform/iceberg
e3mrah f6757c7c93
feat(docs): lean documentation strategy — consolidate 16 docs into 7 canonical + 3 subdirs (#2094)
* docs(arch): consolidate ARCHITECTURE + PLATFORM-TECH-STACK + NAMING + EPICS-1-6 + BOOTSTRAP-KIT-EXPANSION → docs/ARCHITECTURE.md (lean doc strategy)

Single canonical "how OpenOva works" doc per founder's lean-doc strategy.
2926 source lines → 1110 consolidated lines, no semantic loss.

Sections:
 §1  High-level model (Catalyst/Sovereign/Org/Env/Application/Blueprint)
 §2  Repo layout
 §3  Tech stack by layer (CNI/GitOps/IaC/event-spine/data/secrets/identity/...)
 §4  Naming conventions (dimensions, patterns, labels, DOMAINS-CANON)
 §5  Catalyst control plane (rules, CRDs, controllers, cutover, identity, surfaces)
 §6  Per-host-cluster infrastructure
 §7  Application Blueprints
 §8  Multi-region topology (1 cpx52/region, WireGuard-over-public-IPs, ClusterMesh)
 §9  Bootstrap-kit slot ordering (full 48-slot canonical list)
 §10 EPIC-level design overview (EPIC-0 through EPIC-6)
 §11 Per-chart DESIGN.md inventory
 §12 OAM influence
 §13 Read further

Stale literal fixes:
 - omantel.openova.io → omantel.biz / <sovereign>.<tld> / t38.omani.works (7 instances)
 - SPIRE marked DEFERRED / opt-in only (PR #665, TBD-V29 #2055)
 - failover-controller marked REPLACED by bp-continuum

New PR refs wired into §3:
 - PR #665   SPIRE deferral
 - PR #2071  bp-cnpg-pair synchronous remote_apply (zero-tx-loss multi-region)
 - PR #2087  bp-cnpg-pair pre-merge guard
 - PR #2093  bp-cnpg-pair pre-merge guard

New stack components added to §3:
 - bp-cnpg-pair  (synchronous remote_apply ReplicaCluster across ClusterMesh)
 - bp-continuum  (lease-based failover orchestrator)
 - bp-self-sovereign-cutover (8-tether pivot, ADR-0002, Principle #11)

Source docs (to be deleted by orchestrator in final PR):
 - docs/PLATFORM-TECH-STACK.md
 - docs/NAMING-CONVENTION.md
 - docs/EPICS-1-6-unified-design.md
 - docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md

* docs(principles): consolidate INVIOLABLE-PRINCIPLES + ANTI-PATTERN-CATALOG → docs/PRINCIPLES.md (lean doc strategy)

* docs(dod): consolidate 5-PILLAR-DOD + DOMAINS-CANON + SOVEREIGN-MULTI-REGION-DOD + PERSONAS-AND-JOURNEYS → docs/DOD.md (lean doc strategy)

* docs(runbooks+status+glossary): consolidate 5 runbooks → RUNBOOKS.md + refresh STATUS.md + fold banned-terms into GLOSSARY.md (lean doc strategy)

Part 1 — Runbook consolidation:
- NEW docs/RUNBOOKS.md with 7 numbered sections (provisioning, day-2 ops,
  Blueprint authoring, chart conventions, demo walk, failover, troubleshooting)
- Folds BLUEPRINT-AUTHORING / CHART-AUTHORING / DEMO-RUNBOOK /
  RUNBOOK-OPERATIONS / RUNBOOK-PROVISIONING into one canonical surface
- Documents dual-annotation requirement for charts with enabled.default: false
  (GUARD 1 #2087 no-upstream + GUARD 2 #2093 smoke-render) with bp-network-policies:1.0.1
  dead-reserve incident as the live evidence
- All admin.<fqdn> legacy URL refs → console.<fqdn>/bss (BSS lives in operator console)
- All openova.io / omantel.omani.works test commands → canonical t<NN>.omani.works
- Cites PRs #2076 (docs migration), #2082 (no-auto-close-keyword), #2087, #2093

Part 2 — STATUS.md refresh (renamed from IMPLEMENTATION-STATUS.md):
- Header dated 2026-05-20 (was 2026-04-29; 22 days stale per audit)
- Adds 🟦 CODE-COMPLETE state for "controllers + CRDs + tests landed,
  awaiting fresh-prov walk" (per 5-pillar DoD)
- Pillar 3 marked CODE-COMPLETE (PRs #2071/#2072/#2073/#2074/#2075/#2053)
- Adds 3 new CRDs verified in products/catalyst/chart/crds/:
  CNPGPair, PDM, Sandbox
- Sandbox controller chain CODE-COMPLETE
  (PRs #1615/#1618/#1621/#1622/#1626/#1631/#1632)
- SPIRE marked DEFERRED — opt-in only (PRs #665, #2056, #2061)
- New §6 CI / supply-chain guards table: hollow-chart (#2087),
  smoke-render (#2093), no-auto-close-keyword (#2082), observability-toggle,
  subchart 4-step, Flux version-pin replay
- New §9 Pillar-status table — Pillars 1/2/3/4 CODE-COMPLETE, Pillar 5 🚧
- Pillar 1 (PRs #2038 V18, #2043 V18-D), Pillar 2 (PR #2029 V20),
  Pillar 3 (per above), Pillar 4 (Sandbox chain)

Part 3 — GLOSSARY.md folded as single source of truth for banned terms:
- Header dated 2026-05-20, notes "single source of truth for banned terms"
  and "no separate BANNED-TERMS.md"
- Existing 11 banned-terms rows rewritten with italicized qualifiers
- NEW Forbidden test domains subsection:
  openova.io (mothership-only), omantel.openova.io (hallucinated),
  Nova Cloud (predecessor brand), eventforge.io (hallucinated),
  admin.<fqdn> (dead BSS URL)
- SPIFFE/SPIRE identity row + acronym row marked deferred per PR #665
  with TBD-V29 (#2055) re-introduction roadmap
- Cross-links updated: IMPLEMENTATION-STATUS → STATUS,
  SOVEREIGN-PROVISIONING + BLUEPRINT-AUTHORING → RUNBOOKS.md

CLAUDE.md NOT touched. Source files NOT deleted (orchestrator owns deletion).
No push, no PR. Manifest at /tmp/merge-D-runbooks-status-glossary-manifest.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: assemble lean doc strategy — delete legacy sources, move ledger/sessions/archive, ADR-0004, rewrite cross-refs

Per founder direction 2026-05-20 + user-global ~/.claude/CLAUDE.md §11.

This is the orchestrator commit on top of the four cherry-picked consolidation
commits (ARCHITECTURE, PRINCIPLES, DOD, RUNBOOKS+STATUS+GLOSSARY). It:

1. Deletes 15 legacy source docs (now folded into the 7 canonical):
   PLATFORM-TECH-STACK, NAMING-CONVENTION, EPICS-1-6-unified-design,
   BOOTSTRAP-KIT-EXPANSION-PLAN, INVIOLABLE-PRINCIPLES, ANTI-PATTERN-CATALOG,
   5-PILLAR-DOD, DOMAINS-CANON, SOVEREIGN-MULTI-REGION-DOD,
   PERSONAS-AND-JOURNEYS, BLUEPRINT-AUTHORING, CHART-AUTHORING,
   DEMO-RUNBOOK, RUNBOOK-OPERATIONS, RUNBOOK-PROVISIONING.

2. Moves transient + historical docs into proper subdirs:
   - docs/ledger/{TRUST,TRACKER}.md (cron-refreshed live state)
   - docs/sessions/{2026-05-17-convergence,2026-05-19-20-trust-recovery,
     2026-05-20-trust-audit,2026-05-20-walk-runbook}.md
   - docs/archive/{validation-log,orchestrator-state,omantel-handover-wbs}.md

3. Adds docs/adr/0004-cnpg-sync-replication.md (Pillar 3 zero-tx-loss decision)
   + docs/adr/README.md index.

4. Updates CLAUDE.md reading-order + repo-structure block to match the
   lean strategy and current core/ tree (controllers/, marketplace/, etc.).

5. Sweeps all .md files + .github/workflows + scripts to repoint old doc
   paths to the new canonical homes. ADR cross-references kept intact
   (ADRs are immutable historical artifacts).

Operator-side cron scripts that still write to the old paths
(/home/openova/bin/refresh-dod-dashboard.sh, refresh-wbs.sh and
openova-private/bin/trust-audit.sh) need a one-line path update —
flagged in the PR body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(bootstrap-kit): update repo-root sentinel to docs/PRINCIPLES.md

The bootstrap-kit Go test used `docs/INVIOLABLE-PRINCIPLES.md` as its
repo-root sentinel; the file no longer exists after the lean-doc
consolidation (it's now `docs/PRINCIPLES.md`). Update the walker to
match the new canonical filename.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 14:40:01 +04:00
..
README.md feat(docs): lean documentation strategy — consolidate 16 docs into 7 canonical + 3 subdirs (#2094) 2026-05-20 14:40:01 +04:00

Apache Iceberg

Open table format for huge analytic datasets. Application Blueprint (see docs/ARCHITECTURE.md §4.4 — Data lakehouse). Used by bp-fabric to organize lakehouse tables on top of SeaweedFS / cloud archival S3 with ACID transactions, time travel, and schema evolution.

Status: Accepted | Updated: 2026-04-27


Overview

Apache Iceberg is an open table format designed for petabyte-scale analytic datasets. It brings ACID transactions, schema evolution, and time travel to data lakes, closing the gap between traditional data warehouses and raw object storage. Iceberg has become the de facto standard for modern data lakehouse architecture, supported by every major compute engine in the ecosystem.

Within OpenOva, Iceberg provides the storage layer for the Fabric data and integration product. All analytic tables are stored as Iceberg tables on SeaweedFS (S3-compatible object storage), giving customers warehouse-grade reliability without vendor lock-in. Flink writes streaming and batch data into Iceberg tables, and ClickHouse queries them with full SQL for analytics and dashboarding via Grafana.

Iceberg's metadata-driven design means that operations like schema changes, partition layout changes, and snapshot isolation happen without rewriting data files. This makes it safe to evolve table structures in production without downtime or data migration scripts.


Architecture

flowchart TB
    subgraph Writers["Write Path"]
        Flink[Apache Flink]
        Batch[Batch Jobs]
    end

    subgraph Iceberg["Iceberg Table Format"]
        Catalog[Iceberg Catalog]
        Metadata[Metadata Layer]
        Manifests[Manifest Files]
    end

    subgraph Storage["SeaweedFS (S3-Compatible)"]
        Parquet[Parquet Data Files]
        Meta[Metadata Files]
    end

    subgraph Readers["Read Path"]
        CH[ClickHouse]
        Grafana[Grafana]
    end

    Flink --> Catalog
    Batch --> Catalog
    Catalog --> Metadata
    Metadata --> Manifests
    Manifests --> Parquet
    Manifests --> Meta
    CH --> Catalog
    Grafana --> CH

Key Features

Feature Description
ACID Transactions Serializable isolation for concurrent readers and writers
Schema Evolution Add, drop, rename, reorder columns without rewriting data
Partition Evolution Change partition layout without rewriting existing data
Time Travel Query any historical snapshot by timestamp or snapshot ID
Hidden Partitioning Users write queries against logical columns; Iceberg handles physical layout
Row-level Deletes Merge-on-read and copy-on-write delete strategies
Compaction Background rewriting of small files into optimally sized ones
Metadata Filtering Skip files and row groups using column-level statistics

Catalog Configuration

Iceberg requires a catalog to track table metadata. OpenOva uses a JDBC-backed catalog stored in CNPG (PostgreSQL).

Catalog Setup

apiVersion: v1
kind: ConfigMap
metadata:
  name: iceberg-catalog-config
  namespace: data-lakehouse
data:
  catalog.properties: |
    catalog-impl=org.apache.iceberg.jdbc.JdbcCatalog
    uri=jdbc:postgresql://fabric-postgres.databases.svc:5432/iceberg_catalog
    warehouse=s3://iceberg-warehouse/
    io-impl=org.apache.iceberg.aws.s3.S3FileIO
    s3.endpoint=http://seaweedfs.storage.svc:8333
    s3.access-key-id=${SEAWEEDFS_ACCESS_KEY}
    s3.secret-access-key=${SEAWEEDFS_SECRET_KEY}
    s3.path-style-access=true    

ClickHouse Iceberg Integration

ClickHouse queries Iceberg tables directly via its built-in Iceberg table engine:

-- Create an Iceberg table in ClickHouse
CREATE TABLE iceberg_events
ENGINE = Iceberg('http://seaweedfs.storage.svc:8333/iceberg-warehouse/analytics/events/',
    'SEAWEEDFS_ACCESS_KEY', 'SEAWEEDFS_SECRET_KEY')

Table Management

CREATE TABLE iceberg.analytics.events (
    event_id    STRING,
    event_type  STRING,
    user_id     STRING,
    payload     STRING,
    created_at  TIMESTAMP(6),
    event_date  DATE
) PARTITIONED BY (event_date)
WITH (
    'write.format.default' = 'parquet',
    'write.parquet.compression-codec' = 'zstd'
);

Time Travel

Iceberg supports querying historical snapshots by snapshot ID or timestamp. Access time travel via Flink SQL or the Iceberg Java API.

Schema Evolution

-- Safe column operations via Flink SQL (no data rewrite)
ALTER TABLE iceberg.analytics.events ADD COLUMN region STRING;
ALTER TABLE iceberg.analytics.events DROP COLUMN region;

Storage Layout

Bucket Path Contents
iceberg-warehouse /analytics/events/ Parquet data files
iceberg-warehouse /analytics/events/metadata/ Iceberg metadata JSON
iceberg-warehouse /analytics/events/data/ Partition directories

Compaction

Iceberg tables accumulate small files from streaming writes. Periodic compaction merges them into optimally sized files. Compaction can be triggered via Flink's Iceberg maintenance actions or the Iceberg Java API.


Monitoring

Metric Description
iceberg_table_snapshot_count Number of snapshots per table
iceberg_table_data_files Count of data files
iceberg_table_total_records Total row count
iceberg_table_total_size_bytes Total data size
iceberg_compaction_duration_seconds Time spent in compaction

Consequences

Positive:

  • ACID transactions on object storage eliminate data corruption risks
  • Schema and partition evolution without downtime or data rewrites
  • Time travel enables reproducible analytics and audit compliance
  • Engine-agnostic format avoids lock-in to any single compute engine
  • Hidden partitioning simplifies queries for end users
  • Parquet + ZSTD compression delivers excellent storage efficiency

Negative:

  • Requires a metadata catalog (JDBC/PostgreSQL) as an additional dependency
  • Small-file problem from streaming writes requires periodic compaction
  • Snapshot accumulation needs expiration policies to manage metadata growth
  • Learning curve for teams accustomed to traditional RDBMS or Hive tables

Part of OpenOva Fabric - Data & Integration