openova/platform
e3mrah 9645a9044a
feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) (#818)
* feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798)

Per #795 [Q-mine-3] (NATS not RedPanda) + [Q-mine-4] (one ledger), add
the SME-2 metering integration end-to-end. NewAPI is consumed as the
upstream image `ghcr.io/openova-io/openova/newapi-mirror` (a pinned
mirror, not a fork) — the metering envelope is produced by a Go sidecar
that observes the OpenAI-style `usage.total_tokens` field on every
2xx /v1/* response. This avoids forking the upstream binary while still
producing the canonical envelope shape on `catalyst.usage.recorded`.

A) NewAPI metering sidecar — core/services/metering-sidecar/
   - Transparent reverse proxy in front of NewAPI on its own port; the
     bp-newapi Service routes the cluster-fronting port to the sidecar,
     which forwards to NewAPI on the pod's loopback.
   - Observes successful /v1/* JSON responses, parses
     `usage.{prompt_tokens,completion_tokens,total_tokens}`, computes
     amount_micro_omr = -tokens * priceMicroOMRPerToken, and publishes
     one envelope on `catalyst.usage.recorded` per completed request.
   - Failed (non-2xx), non-JSON, and admin-path requests are NOT billed.
   - Customer-facing latency is NEVER blocked on metering: the response
     body is restored before publish; on NATS unreachable the envelope
     is persisted to disk and retried by a background drain loop.
   - 14 unit tests (proxy + publisher + safeFilename guards).

B) sme-billing NATS subscriber — core/services/billing/handlers/
   metering_consumer.go
   - JetStream durable consumer `sme-billing-metering` on stream
     `CATALYST_USAGE` (provisioned by sme-billing on startup).
   - Idempotent on metadata.request_id via a UNIQUE partial index on
     credit_ledger.external_ref; redelivery from the broker collapses
     to a single ledger row.
   - Customer auto-create on cold start (the rbac sme.user.created
     envelope may land AFTER the first metered request; we don't strand
     usage waiting for it).
   - 11 unit tests covering happy-path, idempotency, malformed-payload
     poison-pill, missing-request-id, non-negative amount guard,
     resolver error → Nak, derive-micro-OMR-from-OMR, DB-error → Nak.

C) HTTP handler POST /billing/metering/record — handlers/metering.go
   - Synchronous validate → INSERT credit_ledger → return
     {ledger_entry_id, balance_after_omr, balance_after_micro_omr,
     duplicate}. Same payload + idempotency guard as the NATS path.
   - Auth: superadmin OR sovereign-admin (operator-admin model;
     end-user LLM traffic flows through the sidecar, never this URL).
   - 8 unit tests covering happy-path, idempotency, role gating,
     malformed-JSON, positive-amount rejection, customer-not-found.

D) Schema — core/services/billing/store/store.go
   - ALTER TABLE credit_ledger ADD COLUMN amount_micro_omr BIGINT
     (1 OMR = 1,000,000 micro-OMR; -0.000234 OMR = -234 micro-OMR
     exact integer — preserves precision at metering rates).
   - ADD COLUMN external_ref TEXT + UNIQUE partial index for
     idempotency dedup.
   - ADD COLUMN metadata JSONB for the raw envelope.
   - GetCreditBalance projects both amount_omr (legacy) and
     amount_micro_omr (new) into the integer-OMR view.
   - GetCreditBalanceMicroOMR returns canonical precision.
   - RecordUsage method: ON CONFLICT DO UPDATE … RETURNING (xmax<>0)
     distinguishes fresh insert from duplicate without a follow-up
     SELECT.

E) Wiring
   - core/services/shared/events/nats.go — minimal NATS JetStream
     publisher + subscriber surface; legacy RedPanda producer/consumer
     in events.go untouched per [Q-mine-3].
   - core/services/billing/main.go — NATS_URL env; subscriber wired
     in parallel with the existing RedPanda tenant-events consumer.
   - middleware/jwt.go — exported test helper WithClaims so handler
     tests can construct an authenticated context without minting a
     real signed token.
   - .github/workflows/services-build.yaml — metering-sidecar added
     to the build matrix; deploy job skips it (image consumed by the
     bp-newapi chart, not products/catalyst sme-services).

F) bp-newapi chart (1.0.0 → 1.1.0)
   - meteringSidecar block in values.yaml: image, port, NATS URL,
     priceMicroOMRPerToken (default 156 = 0.000156 OMR/token), spool
     dir, header names, resources, securityContext (read-only-rootfs).
   - deployment.yaml renders the sidecar container + emptyDir spool
     volume when meteringSidecar.enabled (default true).
   - service.yaml routes the cluster-fronting :3000 to the sidecar
     when enabled, exposes a separate :3001 → NewAPI direct port for
     bp-catalyst-platform admin-API traffic (ADR-0003 §3.2).
   - networkpolicy.yaml allows the sidecar's port + nats-system
     egress for JetStream publish.

Tests: 33 new (14 sidecar + 11 subscriber + 8 HTTP handler), all green.
Helm template renders cleanly with sidecar enabled and disabled.

Closes #798

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(billing/store): cast SUM to BIGINT so lib/pq scans into int64 (#798)

Postgres returns `SUM(int) + SUM(bigint)/integer` as `numeric`, which
lib/pq presents as a `[]uint8` decimal string ("50.000000000000000000000000")
that does NOT scan directly into Go int64 — the integration test
TestVoucherLifecycle_IssueRedeemAndCreditApplied caught this in CI on
the post-redeem balance read.

Wrap the SUM expressions in CAST(... AS BIGINT) so the column type is
unambiguously bigint and Scan target stays uniform across pre-#798 rows
(amount_omr only) and post-#798 rows (amount_micro_omr present).

Affects:
  - GetCreditBalance
  - GetCreditBalanceMicroOMR
  - RecordUsage's running-balance read

Test mocks updated to match the new SQL prefix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:32:42 +04:00
..
alloy fix(bp-trivy): node-collector tolerates control-plane taint (closes #769) (#772) 2026-05-04 17:38:29 +02:00
anthropic-adapter feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
bge feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
cert-manager fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681) 2026-05-03 17:12:48 +04:00
cert-manager-dynadot-webhook fix(bp-cert-manager-dynadot-webhook): pin SHA + add ghcr-pull imagePullSecret (#643) 2026-05-02 23:52:42 +04:00
cert-manager-powerdns-webhook fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681) 2026-05-03 17:12:48 +04:00
cilium fix(cilium): k8sServiceHost 127.0.0.1 → 10.0.1.2 so workers' Cilium can reach apiserver (#738) 2026-05-04 14:23:51 +04:00
clickhouse docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
cluster-autoscaler-hcloud feat(provisioner): cluster-autoscaler-hcloud + wizard footprint estimate (closes #767) (#776) 2026-05-04 19:49:44 +04:00
cnpg feat(platform): add global.imageRegistry to bp-openbao/external-secrets/cnpg/valkey/nats-jetstream/powerdns/gitea (PR 2/3, #560) (#565) 2026-05-02 12:52:43 +04:00
coraza fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220) 2026-04-30 06:15:28 +02:00
crossplane feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
crossplane-claims feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
debezium docs(pass-32): registry-DNS sweep — harbor.<domain> across 9 component READMEs 2026-04-27 22:36:39 +02:00
external-dns fix(bp-external-dns): apiserver Endpoints sync timeout — Cilium kube-apiserver entity required (closes #770) (#771) 2026-05-04 19:27:17 +04:00
external-secrets feat(platform): add global.imageRegistry to bp-openbao/external-secrets/cnpg/valkey/nats-jetstream/powerdns/gitea (PR 2/3, #560) (#565) 2026-05-02 12:52:43 +04:00
external-secrets-stores fix(bp-external-secrets-stores): split ClusterSecretStore into separate chart per #247 pattern (closes #331) (#426) 2026-05-01 17:33:47 +04:00
failover-controller refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
falco fix(bp-falco): rename rules_file → rules_files (Falco 0.36+ canonical key, Closes #570) (#574) 2026-05-02 12:59:29 +04:00
ferretdb docs(pass-11b): retry banners on failover-controller/trivy/clickhouse/ferretdb (Edit needed Read first) 2026-04-27 21:45:56 +02:00
flink docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
flux feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
gateway-api fix: bp-gateway-api 5→10 CRDs + bp-gitea CNPG + bp-harbor CNPG race fix + DAG audit (#592) 2026-05-02 15:20:05 +04:00
gitea fix(bp-gitea): use explicit labels in sync-job template (chart 1.2.3 retry) (#670) 2026-05-03 14:37:24 +04:00
grafana feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
guacamole docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
harbor fix(bp-harbor): grep-oE for password (multi-line tolerant) (chart 1.2.13) (#651) 2026-05-03 09:26:23 +04:00
iceberg docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
keda docs(pass-10): banners on 7 more components + opentofu active-active drift fix 2026-04-27 21:43:45 +02:00
keycloak fix(bp-keycloak): catalyst-kc-sa-credentials addr → in-cluster Service URL (closes #781) (#788) 2026-05-04 20:34:22 +04:00
knative feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
kserve feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
kyverno feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
langfuse fix(bp-langfuse): drop apostrophe from description to clear GHCR 500 (resolves #215) (#278) 2026-04-30 17:31:51 +04:00
librechat feat(charts): bp-librechat wrapper chart (closes #275) (#287) 2026-04-30 18:56:59 +04:00
litmus feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216) 2026-04-30 06:07:38 +02:00
livekit feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289) 2026-04-30 19:37:28 +04:00
llm-gateway feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
loki feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
matrix feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289) 2026-04-30 19:37:28 +04:00
milvus docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
mimir fix: drop bp-langfuse from minimal + bp-mimir 1.0.2 push_grpc fix (#664) 2026-05-03 13:50:38 +04:00
nats-jetstream feat(platform): add global.imageRegistry to bp-openbao/external-secrets/cnpg/valkey/nats-jetstream/powerdns/gitea (PR 2/3, #560) (#565) 2026-05-02 12:52:43 +04:00
nemo-guardrails feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
neo4j docs(pass-12): role-in-Catalyst banners on 11 AI/ML Application Blueprints 2026-04-27 21:47:45 +02:00
newapi feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) (#818) 2026-05-04 22:32:42 +04:00
openbao fix(bp-openbao): add BAO_TOKEN+NAMESPACE env to auth-bootstrap (chart 1.2.14) (#666) 2026-05-03 14:02:34 +04:00
openclaw fix(bp-openclaw): use placeholder defaults so blueprint-release smoke render passes (#803) (#813) 2026-05-04 22:17:43 +04:00
openmeter feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289) 2026-04-30 19:37:28 +04:00
opensearch docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
opentelemetry feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
opentofu refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
powerdns fix(powerdns+catalyst-api): zero-touch contabo PowerDNS API key for Sovereign cert-manager (PR #681 followup) (#686) 2026-05-03 18:23:27 +04:00
reflector/chart fix: bp-reflector + rename ghcr-pull-secret->ghcr-pull (Closes #543) (#554) 2026-05-02 12:17:51 +04:00
reloader fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (closes #715) (#716) 2026-05-04 02:04:26 +04:00
sealed-secrets feat(platform): add global.imageRegistry to bp-cilium/cert-manager/cert-manager-pdns-webhook/sealed-secrets (PR 1/3 #560) (#562) 2026-05-02 12:48:37 +04:00
seaweedfs fix(bp-seaweedfs): remove trailing slash in registry — fixes double-slash image ref (Closes #568) (#576) 2026-05-02 13:02:48 +04:00
self-sovereign-cutover feat(bp-self-sovereign-cutover): chart + bootstrap-kit slot 06a (#791) (#808) 2026-05-04 21:55:19 +04:00
sigstore feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216) 2026-04-30 06:07:38 +02:00
spire fix(bp-spire): disable oidc ClusterSPIFFEID + chart bump (1.1.7) (#645) 2026-05-03 08:27:33 +04:00
stalwart docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
stalwart-tenant feat(bp-stalwart-tenant): per-SME dedicated mail server v0.1.0 (#801) (#815) 2026-05-04 22:22:46 +04:00
strimzi docs(pass-35): completion sweep for surviving DNS placeholders (8 components) 2026-04-27 22:46:16 +02:00
stunner feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
syft-grype fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220) 2026-04-30 06:15:28 +02:00
tempo feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
temporal feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
trivy fix(bp-trivy): node-collector tolerates control-plane taint (closes #769) (#772) 2026-05-04 17:38:29 +02:00
valkey feat(platform): add global.imageRegistry to bp-openbao/external-secrets/cnpg/valkey/nats-jetstream/powerdns/gitea (PR 2/3, #560) (#565) 2026-05-02 12:52:43 +04:00
velero feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
vllm feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
vpa fix(bp-vpa): drop registry.k8s.io/ prefix in repository (upstream prepends it) (#641) 2026-05-02 23:32:35 +04:00
wordpress-tenant fix(bp-wordpress-tenant): default-values smoke render must succeed (#800) (#814) 2026-05-04 22:19:40 +04:00