* docs(arch): consolidate ARCHITECTURE + PLATFORM-TECH-STACK + NAMING + EPICS-1-6 + BOOTSTRAP-KIT-EXPANSION → docs/ARCHITECTURE.md (lean doc strategy) Single canonical "how OpenOva works" doc per founder's lean-doc strategy. 2926 source lines → 1110 consolidated lines, no semantic loss. Sections: §1 High-level model (Catalyst/Sovereign/Org/Env/Application/Blueprint) §2 Repo layout §3 Tech stack by layer (CNI/GitOps/IaC/event-spine/data/secrets/identity/...) §4 Naming conventions (dimensions, patterns, labels, DOMAINS-CANON) §5 Catalyst control plane (rules, CRDs, controllers, cutover, identity, surfaces) §6 Per-host-cluster infrastructure §7 Application Blueprints §8 Multi-region topology (1 cpx52/region, WireGuard-over-public-IPs, ClusterMesh) §9 Bootstrap-kit slot ordering (full 48-slot canonical list) §10 EPIC-level design overview (EPIC-0 through EPIC-6) §11 Per-chart DESIGN.md inventory §12 OAM influence §13 Read further Stale literal fixes: - omantel.openova.io → omantel.biz / <sovereign>.<tld> / t38.omani.works (7 instances) - SPIRE marked DEFERRED / opt-in only (PR #665, TBD-V29 #2055) - failover-controller marked REPLACED by bp-continuum New PR refs wired into §3: - PR #665 SPIRE deferral - PR #2071 bp-cnpg-pair synchronous remote_apply (zero-tx-loss multi-region) - PR #2087 bp-cnpg-pair pre-merge guard - PR #2093 bp-cnpg-pair pre-merge guard New stack components added to §3: - bp-cnpg-pair (synchronous remote_apply ReplicaCluster across ClusterMesh) - bp-continuum (lease-based failover orchestrator) - bp-self-sovereign-cutover (8-tether pivot, ADR-0002, Principle #11) Source docs (to be deleted by orchestrator in final PR): - docs/PLATFORM-TECH-STACK.md - docs/NAMING-CONVENTION.md - docs/EPICS-1-6-unified-design.md - docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md * docs(principles): consolidate INVIOLABLE-PRINCIPLES + ANTI-PATTERN-CATALOG → docs/PRINCIPLES.md (lean doc strategy) * docs(dod): consolidate 5-PILLAR-DOD + DOMAINS-CANON + SOVEREIGN-MULTI-REGION-DOD + PERSONAS-AND-JOURNEYS → docs/DOD.md (lean doc strategy) * docs(runbooks+status+glossary): consolidate 5 runbooks → RUNBOOKS.md + refresh STATUS.md + fold banned-terms into GLOSSARY.md (lean doc strategy) Part 1 — Runbook consolidation: - NEW docs/RUNBOOKS.md with 7 numbered sections (provisioning, day-2 ops, Blueprint authoring, chart conventions, demo walk, failover, troubleshooting) - Folds BLUEPRINT-AUTHORING / CHART-AUTHORING / DEMO-RUNBOOK / RUNBOOK-OPERATIONS / RUNBOOK-PROVISIONING into one canonical surface - Documents dual-annotation requirement for charts with enabled.default: false (GUARD 1 #2087 no-upstream + GUARD 2 #2093 smoke-render) with bp-network-policies:1.0.1 dead-reserve incident as the live evidence - All admin.<fqdn> legacy URL refs → console.<fqdn>/bss (BSS lives in operator console) - All openova.io / omantel.omani.works test commands → canonical t<NN>.omani.works - Cites PRs #2076 (docs migration), #2082 (no-auto-close-keyword), #2087, #2093 Part 2 — STATUS.md refresh (renamed from IMPLEMENTATION-STATUS.md): - Header dated 2026-05-20 (was 2026-04-29; 22 days stale per audit) - Adds 🟦 CODE-COMPLETE state for "controllers + CRDs + tests landed, awaiting fresh-prov walk" (per 5-pillar DoD) - Pillar 3 marked CODE-COMPLETE (PRs #2071/#2072/#2073/#2074/#2075/#2053) - Adds 3 new CRDs verified in products/catalyst/chart/crds/: CNPGPair, PDM, Sandbox - Sandbox controller chain CODE-COMPLETE (PRs #1615/#1618/#1621/#1622/#1626/#1631/#1632) - SPIRE marked DEFERRED — opt-in only (PRs #665, #2056, #2061) - New §6 CI / supply-chain guards table: hollow-chart (#2087), smoke-render (#2093), no-auto-close-keyword (#2082), observability-toggle, subchart 4-step, Flux version-pin replay - New §9 Pillar-status table — Pillars 1/2/3/4 CODE-COMPLETE, Pillar 5 🚧 - Pillar 1 (PRs #2038 V18, #2043 V18-D), Pillar 2 (PR #2029 V20), Pillar 3 (per above), Pillar 4 (Sandbox chain) Part 3 — GLOSSARY.md folded as single source of truth for banned terms: - Header dated 2026-05-20, notes "single source of truth for banned terms" and "no separate BANNED-TERMS.md" - Existing 11 banned-terms rows rewritten with italicized qualifiers - NEW Forbidden test domains subsection: openova.io (mothership-only), omantel.openova.io (hallucinated), Nova Cloud (predecessor brand), eventforge.io (hallucinated), admin.<fqdn> (dead BSS URL) - SPIFFE/SPIRE identity row + acronym row marked deferred per PR #665 with TBD-V29 (#2055) re-introduction roadmap - Cross-links updated: IMPLEMENTATION-STATUS → STATUS, SOVEREIGN-PROVISIONING + BLUEPRINT-AUTHORING → RUNBOOKS.md CLAUDE.md NOT touched. Source files NOT deleted (orchestrator owns deletion). No push, no PR. Manifest at /tmp/merge-D-runbooks-status-glossary-manifest.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: assemble lean doc strategy — delete legacy sources, move ledger/sessions/archive, ADR-0004, rewrite cross-refs Per founder direction 2026-05-20 + user-global ~/.claude/CLAUDE.md §11. This is the orchestrator commit on top of the four cherry-picked consolidation commits (ARCHITECTURE, PRINCIPLES, DOD, RUNBOOKS+STATUS+GLOSSARY). It: 1. Deletes 15 legacy source docs (now folded into the 7 canonical): PLATFORM-TECH-STACK, NAMING-CONVENTION, EPICS-1-6-unified-design, BOOTSTRAP-KIT-EXPANSION-PLAN, INVIOLABLE-PRINCIPLES, ANTI-PATTERN-CATALOG, 5-PILLAR-DOD, DOMAINS-CANON, SOVEREIGN-MULTI-REGION-DOD, PERSONAS-AND-JOURNEYS, BLUEPRINT-AUTHORING, CHART-AUTHORING, DEMO-RUNBOOK, RUNBOOK-OPERATIONS, RUNBOOK-PROVISIONING. 2. Moves transient + historical docs into proper subdirs: - docs/ledger/{TRUST,TRACKER}.md (cron-refreshed live state) - docs/sessions/{2026-05-17-convergence,2026-05-19-20-trust-recovery, 2026-05-20-trust-audit,2026-05-20-walk-runbook}.md - docs/archive/{validation-log,orchestrator-state,omantel-handover-wbs}.md 3. Adds docs/adr/0004-cnpg-sync-replication.md (Pillar 3 zero-tx-loss decision) + docs/adr/README.md index. 4. Updates CLAUDE.md reading-order + repo-structure block to match the lean strategy and current core/ tree (controllers/, marketplace/, etc.). 5. Sweeps all .md files + .github/workflows + scripts to repoint old doc paths to the new canonical homes. ADR cross-references kept intact (ADRs are immutable historical artifacts). Operator-side cron scripts that still write to the old paths (/home/openova/bin/refresh-dod-dashboard.sh, refresh-wbs.sh and openova-private/bin/trust-audit.sh) need a one-line path update — flagged in the PR body. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(bootstrap-kit): update repo-root sentinel to docs/PRINCIPLES.md The bootstrap-kit Go test used `docs/INVIOLABLE-PRINCIPLES.md` as its repo-root sentinel; the file no longer exists after the lean-doc consolidation (it's now `docs/PRINCIPLES.md`). Update the walker to match the new canonical filename. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| chart | ||
| blueprint.yaml | ||
| README.md | ||
Falco
Runtime security and eBPF threat detection. Per-host-cluster infrastructure (see docs/ARCHITECTURE.md §3.3) — runs on every host cluster a Sovereign owns. Feeds the SIEM/SOAR pipeline described in docs/SRE.md §10.
Status: Accepted | Updated: 2026-04-27
Overview
Falco is a CNCF Graduated cloud-native runtime security project that detects threats and anomalies in real time. Licensed under the Apache License 2.0, Falco monitors system calls at the kernel level using eBPF probes, providing deep visibility into container and host behavior without requiring application instrumentation or sidecar injection.
In the OpenOva platform, Falco serves as the runtime security layer that complements Trivy's static scanning capabilities. While Trivy scans container images, Kubernetes manifests, and IaC for known vulnerabilities before deployment, Falco monitors what containers actually do at runtime: detecting container escapes, privilege escalation attempts, unexpected network connections, cryptomining, and file access violations as they occur.
Falco events are routed through Falcosidekick to OpenSearch, where they are stored, correlated, and visualized as a SIEM (Security Information and Event Management) solution. This pipeline provides security teams with a complete runtime threat detection and investigation workflow using entirely open-source components.
Architecture
SIEM Pipeline
flowchart LR
subgraph Kernel["Kernel Space"]
Syscalls[System Calls]
eBPF[eBPF Probes]
end
subgraph Detection["Falco"]
Engine[Rules Engine]
Rules[Detection Rules]
end
subgraph Routing["Falcosidekick"]
Router[Event Router]
end
subgraph SIEM["OpenSearch"]
Index[Security Index]
Dashboards[OpenSearch Dashboards]
Alerting[Alerting Plugin]
end
Syscalls --> eBPF
eBPF --> Engine
Rules --> Engine
Engine -->|"Security Events"| Router
Router -->|"falco-* indices"| Index
Index --> Dashboards
Index --> Alerting
Full Security Stack
flowchart TB
subgraph PreDeploy["Pre-Deployment (Trivy)"]
CI[CI/CD Scan]
Registry[Harbor Scan]
Operator[Trivy Operator]
end
subgraph Runtime["Runtime (Falco)"]
FalcoAgent[Falco DaemonSet]
FalcoSidekick[Falcosidekick]
end
subgraph Policy["Policy (Kyverno)"]
Admission[Admission Control]
Audit[Policy Audit]
end
subgraph Analysis["Security Analysis"]
OpenSearch[OpenSearch SIEM]
OSD[OpenSearch Dashboards]
Grafana[Grafana Alerts]
end
CI -->|"Block vulnerable builds"| Registry
Registry -->|"Block vulnerable pulls"| Operator
Operator -->|"Continuous scanning"| Analysis
FalcoAgent -->|"Runtime events"| FalcoSidekick
FalcoSidekick --> OpenSearch
FalcoSidekick --> Grafana
OpenSearch --> OSD
Admission -->|"Prevent misconfig"| Audit
Why Falco?
| Factor | Falco | Trivy | Kyverno |
|---|---|---|---|
| Detection type | Runtime behavior | Static vulnerability scan | Policy enforcement |
| When | During execution | Before/after deployment | At admission |
| How | eBPF syscall monitoring | Image/manifest scanning | Webhook validation |
| Detects | Container escape, privilege escalation | CVEs, misconfigurations | Policy violations |
| CNCF status | Graduated | Sandbox (Aqua Security) | Graduated |
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 |
Decision: Falco, Trivy, and Kyverno are complementary. Trivy provides shift-left static scanning, Kyverno enforces admission policies, and Falco detects runtime threats. All three are required for defense-in-depth security.
Key Features
| Feature | Description |
|---|---|
| eBPF Kernel Monitoring | Syscall-level visibility without kernel modules or application changes |
| Container Escape Detection | Detects attempts to break out of container namespaces |
| Privilege Escalation Alerts | Monitors for unexpected setuid, capability changes, and root access |
| Network Anomaly Detection | Identifies unexpected outbound connections and port scanning |
| File Integrity Monitoring | Alerts on unauthorized reads/writes to sensitive paths |
| Cryptomining Detection | Detects known mining binaries and connection patterns |
| K8s Audit Log Analysis | Processes Kubernetes API server audit events |
| Custom Rules | Flexible rule language for organization-specific detections |
| Falcosidekick | Event router supporting 60+ output destinations |
| Talon Response | Automated response actions (kill pod, isolate network) |
Threat Detection Categories
| Category | Example Rules |
|---|---|
| Container Escape | Mount namespace escape, container breakout via nsenter |
| Privilege Escalation | Unexpected setuid binary, capability escalation |
| Lateral Movement | Unexpected SSH connections, internal port scanning |
| Persistence | Crontab modification, systemd unit creation in container |
| Data Exfiltration | Bulk data reads from sensitive paths, DNS tunneling |
| Cryptomining | Known miner binary execution, stratum protocol connections |
| Supply Chain | Unexpected binary download during runtime, `curl |
| Filesystem | Write to /etc/passwd, modification of package manager databases |
Configuration
Falco DaemonSet (Helm)
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: falco
namespace: falco-system
spec:
interval: 10m
chart:
spec:
chart: falco
version: "4.x"
sourceRef:
kind: HelmRepository
name: falcosecurity
namespace: flux-system
values:
driver:
kind: modern_ebpf
collectors:
kubernetes:
enabled: true
falcoctl:
artifact:
install:
enabled: true
follow:
enabled: true
config:
artifact:
allowedTypes:
- rulesfile
- plugin
install:
refs:
- falco-rules:3
follow:
refs:
- falco-rules:3
falco:
rules_files:
- /etc/falco/falco_rules.yaml
- /etc/falco/falco_rules.local.yaml
- /etc/falco/rules.d
load_plugins:
- name: k8saudit
library_path: libk8saudit.so
- name: json
library_path: libjson.so
json_output: true
json_include_output_property: true
log_stderr: true
log_syslog: false
priority: notice
buffered_outputs: false
http_output:
enabled: true
url: http://falcosidekick.falco-system.svc:2801
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 1
memory: 1Gi
tolerations:
- effect: NoSchedule
operator: Exists
Falcosidekick
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: falcosidekick
namespace: falco-system
spec:
interval: 10m
chart:
spec:
chart: falcosidekick
version: "0.8.x"
sourceRef:
kind: HelmRepository
name: falcosecurity
namespace: flux-system
values:
config:
opensearch:
hostPort: https://opensearch.search.svc:9200
index: falco
type: _doc
minimumPriority: notice
username: falco-writer
password: ""
existingSecret: opensearch-falco-credentials
createIndexTemplate: true
prometheus:
extralabels: "source:falco"
webhook:
address: ""
minimumPriority: critical
webui:
enabled: true
replicaCount: 1
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
Custom Rules
# /etc/falco/falco_rules.local.yaml
- rule: Unexpected Outbound Connection from Database
desc: Detect outbound network connections from database containers
condition: >
outbound and
container and
container.image.repository contains "mongo" or
container.image.repository contains "postgres" or
container.image.repository contains "mysql"
output: >
Unexpected outbound connection from database container
(command=%proc.cmdline connection=%fd.name
container_id=%container.id image=%container.image.repository
namespace=%k8s.ns.name pod=%k8s.pod.name)
priority: WARNING
tags: [network, database, mitre_exfiltration]
- rule: Write Below Binary Directories in Container
desc: Detect writes to binary directories within containers
condition: >
write and container and
(fd.directory = /usr/bin or fd.directory = /usr/sbin or
fd.directory = /bin or fd.directory = /sbin)
output: >
Write to binary directory in container
(user=%user.name command=%proc.cmdline file=%fd.name
container_id=%container.id image=%container.image.repository)
priority: CRITICAL
tags: [filesystem, mitre_persistence]
- rule: Crypto Mining Binary Detected
desc: Detect known cryptocurrency mining binaries
condition: >
spawned_process and container and
(proc.name in (xmrig, ccminer, minerd, cpuminer, bfgminer))
output: >
Crypto mining binary detected
(user=%user.name command=%proc.cmdline
container_id=%container.id image=%container.image.repository
namespace=%k8s.ns.name pod=%k8s.pod.name)
priority: CRITICAL
tags: [cryptomining, mitre_execution]
Trivy vs Falco (Complementary Roles)
flowchart LR
subgraph ShiftLeft["Shift-Left (Pre-Deploy)"]
Trivy[Trivy]
TrivyWhat["Scans: CVEs, IaC misconfig, secrets"]
end
subgraph Runtime["Runtime (Post-Deploy)"]
Falco[Falco]
FalcoWhat["Detects: escapes, escalation, anomalies"]
end
subgraph Response["Response"]
Block[Block Deployment]
Alert[Alert + Investigate]
Kill[Kill Pod / Isolate]
end
Trivy --> TrivyWhat --> Block
Falco --> FalcoWhat --> Alert
FalcoWhat --> Kill
| Aspect | Trivy | Falco |
|---|---|---|
| When | Pre-deployment / continuous | Runtime |
| What | Known CVEs, misconfigurations | Behavioral anomalies |
| How | Image/manifest scanning | Kernel syscall monitoring |
| Action | Block build/deploy | Alert, investigate, respond |
| Scope | Static analysis | Dynamic analysis |
Monitoring
| Metric | Description |
|---|---|
falco_events_total |
Total security events generated |
falco_events_by_priority |
Events grouped by severity |
falcosidekick_outputs_total |
Events forwarded per output |
falcosidekick_outputs_errors_total |
Output delivery failures |
falco_kernel_drops_total |
Dropped syscall events (capacity issue) |
Consequences
Positive:
- CNCF Graduated with deep kernel-level visibility via eBPF
- Detects threats that static scanning cannot (runtime behavior anomalies)
- Completes the security triad with Trivy (static) and Kyverno (policy)
- Falcosidekick provides flexible routing to OpenSearch SIEM and 60+ destinations
- Extensive default ruleset covers MITRE ATT&CK framework tactics
Negative:
- eBPF driver requires compatible kernel versions (5.8+ recommended)
- High event volume in noisy environments requires rule tuning to reduce false positives
- DaemonSet deployment adds resource overhead on every node
- Custom rule development requires understanding of Linux syscall semantics
- Kernel-level monitoring adds a sensitive privileged workload to the cluster
Part of OpenOva