openova/platform/falco
e3mrah f6757c7c93
feat(docs): lean documentation strategy — consolidate 16 docs into 7 canonical + 3 subdirs (#2094)
* docs(arch): consolidate ARCHITECTURE + PLATFORM-TECH-STACK + NAMING + EPICS-1-6 + BOOTSTRAP-KIT-EXPANSION → docs/ARCHITECTURE.md (lean doc strategy)

Single canonical "how OpenOva works" doc per founder's lean-doc strategy.
2926 source lines → 1110 consolidated lines, no semantic loss.

Sections:
 §1  High-level model (Catalyst/Sovereign/Org/Env/Application/Blueprint)
 §2  Repo layout
 §3  Tech stack by layer (CNI/GitOps/IaC/event-spine/data/secrets/identity/...)
 §4  Naming conventions (dimensions, patterns, labels, DOMAINS-CANON)
 §5  Catalyst control plane (rules, CRDs, controllers, cutover, identity, surfaces)
 §6  Per-host-cluster infrastructure
 §7  Application Blueprints
 §8  Multi-region topology (1 cpx52/region, WireGuard-over-public-IPs, ClusterMesh)
 §9  Bootstrap-kit slot ordering (full 48-slot canonical list)
 §10 EPIC-level design overview (EPIC-0 through EPIC-6)
 §11 Per-chart DESIGN.md inventory
 §12 OAM influence
 §13 Read further

Stale literal fixes:
 - omantel.openova.io → omantel.biz / <sovereign>.<tld> / t38.omani.works (7 instances)
 - SPIRE marked DEFERRED / opt-in only (PR #665, TBD-V29 #2055)
 - failover-controller marked REPLACED by bp-continuum

New PR refs wired into §3:
 - PR #665   SPIRE deferral
 - PR #2071  bp-cnpg-pair synchronous remote_apply (zero-tx-loss multi-region)
 - PR #2087  bp-cnpg-pair pre-merge guard
 - PR #2093  bp-cnpg-pair pre-merge guard

New stack components added to §3:
 - bp-cnpg-pair  (synchronous remote_apply ReplicaCluster across ClusterMesh)
 - bp-continuum  (lease-based failover orchestrator)
 - bp-self-sovereign-cutover (8-tether pivot, ADR-0002, Principle #11)

Source docs (to be deleted by orchestrator in final PR):
 - docs/PLATFORM-TECH-STACK.md
 - docs/NAMING-CONVENTION.md
 - docs/EPICS-1-6-unified-design.md
 - docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md

* docs(principles): consolidate INVIOLABLE-PRINCIPLES + ANTI-PATTERN-CATALOG → docs/PRINCIPLES.md (lean doc strategy)

* docs(dod): consolidate 5-PILLAR-DOD + DOMAINS-CANON + SOVEREIGN-MULTI-REGION-DOD + PERSONAS-AND-JOURNEYS → docs/DOD.md (lean doc strategy)

* docs(runbooks+status+glossary): consolidate 5 runbooks → RUNBOOKS.md + refresh STATUS.md + fold banned-terms into GLOSSARY.md (lean doc strategy)

Part 1 — Runbook consolidation:
- NEW docs/RUNBOOKS.md with 7 numbered sections (provisioning, day-2 ops,
  Blueprint authoring, chart conventions, demo walk, failover, troubleshooting)
- Folds BLUEPRINT-AUTHORING / CHART-AUTHORING / DEMO-RUNBOOK /
  RUNBOOK-OPERATIONS / RUNBOOK-PROVISIONING into one canonical surface
- Documents dual-annotation requirement for charts with enabled.default: false
  (GUARD 1 #2087 no-upstream + GUARD 2 #2093 smoke-render) with bp-network-policies:1.0.1
  dead-reserve incident as the live evidence
- All admin.<fqdn> legacy URL refs → console.<fqdn>/bss (BSS lives in operator console)
- All openova.io / omantel.omani.works test commands → canonical t<NN>.omani.works
- Cites PRs #2076 (docs migration), #2082 (no-auto-close-keyword), #2087, #2093

Part 2 — STATUS.md refresh (renamed from IMPLEMENTATION-STATUS.md):
- Header dated 2026-05-20 (was 2026-04-29; 22 days stale per audit)
- Adds 🟦 CODE-COMPLETE state for "controllers + CRDs + tests landed,
  awaiting fresh-prov walk" (per 5-pillar DoD)
- Pillar 3 marked CODE-COMPLETE (PRs #2071/#2072/#2073/#2074/#2075/#2053)
- Adds 3 new CRDs verified in products/catalyst/chart/crds/:
  CNPGPair, PDM, Sandbox
- Sandbox controller chain CODE-COMPLETE
  (PRs #1615/#1618/#1621/#1622/#1626/#1631/#1632)
- SPIRE marked DEFERRED — opt-in only (PRs #665, #2056, #2061)
- New §6 CI / supply-chain guards table: hollow-chart (#2087),
  smoke-render (#2093), no-auto-close-keyword (#2082), observability-toggle,
  subchart 4-step, Flux version-pin replay
- New §9 Pillar-status table — Pillars 1/2/3/4 CODE-COMPLETE, Pillar 5 🚧
- Pillar 1 (PRs #2038 V18, #2043 V18-D), Pillar 2 (PR #2029 V20),
  Pillar 3 (per above), Pillar 4 (Sandbox chain)

Part 3 — GLOSSARY.md folded as single source of truth for banned terms:
- Header dated 2026-05-20, notes "single source of truth for banned terms"
  and "no separate BANNED-TERMS.md"
- Existing 11 banned-terms rows rewritten with italicized qualifiers
- NEW Forbidden test domains subsection:
  openova.io (mothership-only), omantel.openova.io (hallucinated),
  Nova Cloud (predecessor brand), eventforge.io (hallucinated),
  admin.<fqdn> (dead BSS URL)
- SPIFFE/SPIRE identity row + acronym row marked deferred per PR #665
  with TBD-V29 (#2055) re-introduction roadmap
- Cross-links updated: IMPLEMENTATION-STATUS → STATUS,
  SOVEREIGN-PROVISIONING + BLUEPRINT-AUTHORING → RUNBOOKS.md

CLAUDE.md NOT touched. Source files NOT deleted (orchestrator owns deletion).
No push, no PR. Manifest at /tmp/merge-D-runbooks-status-glossary-manifest.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: assemble lean doc strategy — delete legacy sources, move ledger/sessions/archive, ADR-0004, rewrite cross-refs

Per founder direction 2026-05-20 + user-global ~/.claude/CLAUDE.md §11.

This is the orchestrator commit on top of the four cherry-picked consolidation
commits (ARCHITECTURE, PRINCIPLES, DOD, RUNBOOKS+STATUS+GLOSSARY). It:

1. Deletes 15 legacy source docs (now folded into the 7 canonical):
   PLATFORM-TECH-STACK, NAMING-CONVENTION, EPICS-1-6-unified-design,
   BOOTSTRAP-KIT-EXPANSION-PLAN, INVIOLABLE-PRINCIPLES, ANTI-PATTERN-CATALOG,
   5-PILLAR-DOD, DOMAINS-CANON, SOVEREIGN-MULTI-REGION-DOD,
   PERSONAS-AND-JOURNEYS, BLUEPRINT-AUTHORING, CHART-AUTHORING,
   DEMO-RUNBOOK, RUNBOOK-OPERATIONS, RUNBOOK-PROVISIONING.

2. Moves transient + historical docs into proper subdirs:
   - docs/ledger/{TRUST,TRACKER}.md (cron-refreshed live state)
   - docs/sessions/{2026-05-17-convergence,2026-05-19-20-trust-recovery,
     2026-05-20-trust-audit,2026-05-20-walk-runbook}.md
   - docs/archive/{validation-log,orchestrator-state,omantel-handover-wbs}.md

3. Adds docs/adr/0004-cnpg-sync-replication.md (Pillar 3 zero-tx-loss decision)
   + docs/adr/README.md index.

4. Updates CLAUDE.md reading-order + repo-structure block to match the
   lean strategy and current core/ tree (controllers/, marketplace/, etc.).

5. Sweeps all .md files + .github/workflows + scripts to repoint old doc
   paths to the new canonical homes. ADR cross-references kept intact
   (ADRs are immutable historical artifacts).

Operator-side cron scripts that still write to the old paths
(/home/openova/bin/refresh-dod-dashboard.sh, refresh-wbs.sh and
openova-private/bin/trust-audit.sh) need a one-line path update —
flagged in the PR body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(bootstrap-kit): update repo-root sentinel to docs/PRINCIPLES.md

The bootstrap-kit Go test used `docs/INVIOLABLE-PRINCIPLES.md` as its
repo-root sentinel; the file no longer exists after the lean-doc
consolidation (it's now `docs/PRINCIPLES.md`). Update the walker to
match the new canonical filename.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 14:40:01 +04:00
..
chart fix(bp-falco): rename rules_file → rules_files (Falco 0.36+ canonical key, Closes #570) (#574) 2026-05-02 12:59:29 +04:00
blueprint.yaml fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
README.md feat(docs): lean documentation strategy — consolidate 16 docs into 7 canonical + 3 subdirs (#2094) 2026-05-20 14:40:01 +04:00

Falco

Runtime security and eBPF threat detection. Per-host-cluster infrastructure (see docs/ARCHITECTURE.md §3.3) — runs on every host cluster a Sovereign owns. Feeds the SIEM/SOAR pipeline described in docs/SRE.md §10.

Status: Accepted | Updated: 2026-04-27


Overview

Falco is a CNCF Graduated cloud-native runtime security project that detects threats and anomalies in real time. Licensed under the Apache License 2.0, Falco monitors system calls at the kernel level using eBPF probes, providing deep visibility into container and host behavior without requiring application instrumentation or sidecar injection.

In the OpenOva platform, Falco serves as the runtime security layer that complements Trivy's static scanning capabilities. While Trivy scans container images, Kubernetes manifests, and IaC for known vulnerabilities before deployment, Falco monitors what containers actually do at runtime: detecting container escapes, privilege escalation attempts, unexpected network connections, cryptomining, and file access violations as they occur.

Falco events are routed through Falcosidekick to OpenSearch, where they are stored, correlated, and visualized as a SIEM (Security Information and Event Management) solution. This pipeline provides security teams with a complete runtime threat detection and investigation workflow using entirely open-source components.


Architecture

SIEM Pipeline

flowchart LR
    subgraph Kernel["Kernel Space"]
        Syscalls[System Calls]
        eBPF[eBPF Probes]
    end

    subgraph Detection["Falco"]
        Engine[Rules Engine]
        Rules[Detection Rules]
    end

    subgraph Routing["Falcosidekick"]
        Router[Event Router]
    end

    subgraph SIEM["OpenSearch"]
        Index[Security Index]
        Dashboards[OpenSearch Dashboards]
        Alerting[Alerting Plugin]
    end

    Syscalls --> eBPF
    eBPF --> Engine
    Rules --> Engine
    Engine -->|"Security Events"| Router
    Router -->|"falco-* indices"| Index
    Index --> Dashboards
    Index --> Alerting

Full Security Stack

flowchart TB
    subgraph PreDeploy["Pre-Deployment (Trivy)"]
        CI[CI/CD Scan]
        Registry[Harbor Scan]
        Operator[Trivy Operator]
    end

    subgraph Runtime["Runtime (Falco)"]
        FalcoAgent[Falco DaemonSet]
        FalcoSidekick[Falcosidekick]
    end

    subgraph Policy["Policy (Kyverno)"]
        Admission[Admission Control]
        Audit[Policy Audit]
    end

    subgraph Analysis["Security Analysis"]
        OpenSearch[OpenSearch SIEM]
        OSD[OpenSearch Dashboards]
        Grafana[Grafana Alerts]
    end

    CI -->|"Block vulnerable builds"| Registry
    Registry -->|"Block vulnerable pulls"| Operator
    Operator -->|"Continuous scanning"| Analysis
    FalcoAgent -->|"Runtime events"| FalcoSidekick
    FalcoSidekick --> OpenSearch
    FalcoSidekick --> Grafana
    OpenSearch --> OSD
    Admission -->|"Prevent misconfig"| Audit

Why Falco?

Factor Falco Trivy Kyverno
Detection type Runtime behavior Static vulnerability scan Policy enforcement
When During execution Before/after deployment At admission
How eBPF syscall monitoring Image/manifest scanning Webhook validation
Detects Container escape, privilege escalation CVEs, misconfigurations Policy violations
CNCF status Graduated Sandbox (Aqua Security) Graduated
License Apache 2.0 Apache 2.0 Apache 2.0

Decision: Falco, Trivy, and Kyverno are complementary. Trivy provides shift-left static scanning, Kyverno enforces admission policies, and Falco detects runtime threats. All three are required for defense-in-depth security.


Key Features

Feature Description
eBPF Kernel Monitoring Syscall-level visibility without kernel modules or application changes
Container Escape Detection Detects attempts to break out of container namespaces
Privilege Escalation Alerts Monitors for unexpected setuid, capability changes, and root access
Network Anomaly Detection Identifies unexpected outbound connections and port scanning
File Integrity Monitoring Alerts on unauthorized reads/writes to sensitive paths
Cryptomining Detection Detects known mining binaries and connection patterns
K8s Audit Log Analysis Processes Kubernetes API server audit events
Custom Rules Flexible rule language for organization-specific detections
Falcosidekick Event router supporting 60+ output destinations
Talon Response Automated response actions (kill pod, isolate network)

Threat Detection Categories

Category Example Rules
Container Escape Mount namespace escape, container breakout via nsenter
Privilege Escalation Unexpected setuid binary, capability escalation
Lateral Movement Unexpected SSH connections, internal port scanning
Persistence Crontab modification, systemd unit creation in container
Data Exfiltration Bulk data reads from sensitive paths, DNS tunneling
Cryptomining Known miner binary execution, stratum protocol connections
Supply Chain Unexpected binary download during runtime, `curl
Filesystem Write to /etc/passwd, modification of package manager databases

Configuration

Falco DaemonSet (Helm)

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: falco
  namespace: falco-system
spec:
  interval: 10m
  chart:
    spec:
      chart: falco
      version: "4.x"
      sourceRef:
        kind: HelmRepository
        name: falcosecurity
        namespace: flux-system
  values:
    driver:
      kind: modern_ebpf

    collectors:
      kubernetes:
        enabled: true

    falcoctl:
      artifact:
        install:
          enabled: true
        follow:
          enabled: true
      config:
        artifact:
          allowedTypes:
            - rulesfile
            - plugin
          install:
            refs:
              - falco-rules:3
          follow:
            refs:
              - falco-rules:3

    falco:
      rules_files:
        - /etc/falco/falco_rules.yaml
        - /etc/falco/falco_rules.local.yaml
        - /etc/falco/rules.d
      load_plugins:
        - name: k8saudit
          library_path: libk8saudit.so
        - name: json
          library_path: libjson.so
      json_output: true
      json_include_output_property: true
      log_stderr: true
      log_syslog: false
      priority: notice
      buffered_outputs: false
      http_output:
        enabled: true
        url: http://falcosidekick.falco-system.svc:2801

    resources:
      requests:
        cpu: 100m
        memory: 512Mi
      limits:
        cpu: 1
        memory: 1Gi

    tolerations:
      - effect: NoSchedule
        operator: Exists

Falcosidekick

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: falcosidekick
  namespace: falco-system
spec:
  interval: 10m
  chart:
    spec:
      chart: falcosidekick
      version: "0.8.x"
      sourceRef:
        kind: HelmRepository
        name: falcosecurity
        namespace: flux-system
  values:
    config:
      opensearch:
        hostPort: https://opensearch.search.svc:9200
        index: falco
        type: _doc
        minimumPriority: notice
        username: falco-writer
        password: ""
        existingSecret: opensearch-falco-credentials
        createIndexTemplate: true

      prometheus:
        extralabels: "source:falco"

      webhook:
        address: ""
        minimumPriority: critical

    webui:
      enabled: true
      replicaCount: 1

    resources:
      requests:
        cpu: 50m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 256Mi

Custom Rules

# /etc/falco/falco_rules.local.yaml
- rule: Unexpected Outbound Connection from Database
  desc: Detect outbound network connections from database containers
  condition: >
    outbound and
    container and
    container.image.repository contains "mongo" or
    container.image.repository contains "postgres" or
    container.image.repository contains "mysql"    
  output: >
    Unexpected outbound connection from database container
    (command=%proc.cmdline connection=%fd.name
    container_id=%container.id image=%container.image.repository
    namespace=%k8s.ns.name pod=%k8s.pod.name)    
  priority: WARNING
  tags: [network, database, mitre_exfiltration]

- rule: Write Below Binary Directories in Container
  desc: Detect writes to binary directories within containers
  condition: >
    write and container and
    (fd.directory = /usr/bin or fd.directory = /usr/sbin or
     fd.directory = /bin or fd.directory = /sbin)    
  output: >
    Write to binary directory in container
    (user=%user.name command=%proc.cmdline file=%fd.name
    container_id=%container.id image=%container.image.repository)    
  priority: CRITICAL
  tags: [filesystem, mitre_persistence]

- rule: Crypto Mining Binary Detected
  desc: Detect known cryptocurrency mining binaries
  condition: >
    spawned_process and container and
    (proc.name in (xmrig, ccminer, minerd, cpuminer, bfgminer))    
  output: >
    Crypto mining binary detected
    (user=%user.name command=%proc.cmdline
    container_id=%container.id image=%container.image.repository
    namespace=%k8s.ns.name pod=%k8s.pod.name)    
  priority: CRITICAL
  tags: [cryptomining, mitre_execution]

Trivy vs Falco (Complementary Roles)

flowchart LR
    subgraph ShiftLeft["Shift-Left (Pre-Deploy)"]
        Trivy[Trivy]
        TrivyWhat["Scans: CVEs, IaC misconfig, secrets"]
    end

    subgraph Runtime["Runtime (Post-Deploy)"]
        Falco[Falco]
        FalcoWhat["Detects: escapes, escalation, anomalies"]
    end

    subgraph Response["Response"]
        Block[Block Deployment]
        Alert[Alert + Investigate]
        Kill[Kill Pod / Isolate]
    end

    Trivy --> TrivyWhat --> Block
    Falco --> FalcoWhat --> Alert
    FalcoWhat --> Kill
Aspect Trivy Falco
When Pre-deployment / continuous Runtime
What Known CVEs, misconfigurations Behavioral anomalies
How Image/manifest scanning Kernel syscall monitoring
Action Block build/deploy Alert, investigate, respond
Scope Static analysis Dynamic analysis

Monitoring

Metric Description
falco_events_total Total security events generated
falco_events_by_priority Events grouped by severity
falcosidekick_outputs_total Events forwarded per output
falcosidekick_outputs_errors_total Output delivery failures
falco_kernel_drops_total Dropped syscall events (capacity issue)

Consequences

Positive:

  • CNCF Graduated with deep kernel-level visibility via eBPF
  • Detects threats that static scanning cannot (runtime behavior anomalies)
  • Completes the security triad with Trivy (static) and Kyverno (policy)
  • Falcosidekick provides flexible routing to OpenSearch SIEM and 60+ destinations
  • Extensive default ruleset covers MITRE ATT&CK framework tactics

Negative:

  • eBPF driver requires compatible kernel versions (5.8+ recommended)
  • High event volume in noisy environments requires rule tuning to reduce false positives
  • DaemonSet deployment adds resource overhead on every node
  • Custom rule development requires understanding of Linux syscall semantics
  • Kernel-level monitoring adds a sensitive privileged workload to the cluster

Part of OpenOva