cilium-envoy refuses to bind privileged ports (80/443) on Sovereigns
even with all of:
- gatewayAPI.hostNetwork.enabled=true on the Cilium chart
- securityContext.privileged=true on the cilium-envoy DaemonSet
- securityContext.capabilities.add=[NET_BIND_SERVICE]
- envoy-keep-cap-netbindservice=true in cilium-config ConfigMap
- Gateway API CRDs at v1.3.0 (matching cilium 1.19.3 schema)
Repeatable error from cilium-envoy logs across otech45, otech46, otech47:
listener 'kube-system/cilium-gateway-cilium-gateway/listener' failed
to bind or apply socket options: cannot bind '0.0.0.0:80':
Permission denied
The bind() syscall is intercepted by cilium-agent's BPF socket-LB
program in a way that does not honour container capabilities. Even
PID 1 with CapEff=0x000001ffffffffff (all caps) and uid=0 gets
"Permission denied". Cilium 1.19.3 → 1.16.5 made no difference
(F1, PR #684 still ships — the version bump is sound for other
reasons; the listener bind is just a separate fix).
This commit moves the listeners to high ports (30080/30443) and lets
the Hetzner LB do the public-facing port translation:
HCLB :80 → CP node :30080 (cilium-gateway HTTP listener)
HCLB :443 → CP node :30443 (cilium-gateway HTTPS listener)
External users still hit `https://console.<sov>.omani.works/auth/handover`
on port 443; the high port is invisible. High-port bind succeeds
without NET_BIND_SERVICE because the kernel only gates ports below
`net.ipv4.ip_unprivileged_port_start` (default 1024).
Will be verified on otech48: the next fresh provision should serve
console.otech48/auth/handover end-to-end without the 502/timeout
chain seen on otech45–47.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-8a-preflight live deployment a56961fbd5ae6003 confirmed bootstrap-kit
Kustomization still fails dry-run after #484 — same pattern, different CRD:
Gateway/kube-system/cilium-gateway dry-run failed: no matches for kind
'Gateway' in version 'gateway.networking.k8s.io/v1'
The Gateway API CRDs ARE installed by the Cilium HelmRelease (gatewayAPI.enabled=true)
but Flux validates ALL resources in the Kustomization BEFORE applying any HR. So at
validation time, Cilium hasn't installed yet → no CRDs → Gateway dry-run fails.
Same fix shape as #484 (Cert split): move Gateway into sovereign-tls Kustomization
which dependsOn bootstrap-kit Ready (i.e. Cilium HR is up + CRDs registered).
Updated:
- clusters/_template/sovereign-tls/cilium-gateway.yaml (NEW)
- clusters/_template/sovereign-tls/kustomization.yaml (resources list)
- clusters/_template/bootstrap-kit/01-cilium.yaml (Gateway block removed)
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>