All posts
DORAKubernetesfintechEU regulationoperational resilience

Kubernetes, DORA, and the Compliance Gap in European Fintech

DORA comes into force with teeth in 2025. Your Kubernetes cluster probably cannot prove the controls it requires. Here is what the regulation actually asks, and what a deterministic pipeline produces.

CPLT Engineering··9 min read

The Digital Operational Resilience Act (DORA) does not read like other EU financial regulation. It reads like a SRE handbook written by a lawyer. Article 6 talks about "ICT risk management frameworks." Article 10 mandates "mechanisms to promptly detect anomalous activities." Article 28 governs the entire third-party supply chain, including — and this is the part most fintech CTOs missed — the cloud runtimes your payment rails happen to be deployed on.

If you are a European fintech running on Kubernetes, DORA is not a policy project. It is an engineering project. And the engineering project has a structural problem: the controls DORA demands are runtime controls, and the industry's compliance apparatus still treats Kubernetes as if it were a spreadsheet.

What DORA actually asks of the runtime

Strip out the recitals and the supervisory choreography, and the operational core of DORA is four obligations:

  1. Continuous detection. You must be able to detect ICT-related incidents in near real time and classify them by severity. Not "we have a SIEM." The regulation specifies detection, classification, and notification timelines — significant incidents must be reported within 4 hours of classification, and a root-cause report within one month.

  2. Demonstrable resilience testing. Regular, independent, threat-led penetration testing (TLPT) — with evidence. TLPT is a defined methodology, not a pen test rebrand. It assumes the attacker has insider knowledge and evaluates whether your operational response — not just your technical controls — holds under pressure.

  3. Third-party oversight. Any "ICT third-party service provider" in your critical functions is in scope. If your control plane, your ingress, or your secrets manager is SaaS, that vendor is in scope for your DORA posture. You owe the regulator a concentration-risk register and exit plans.

  4. Traceable governance. The board of management is accountable. That sentence is doing heavy lifting — it means the board must be able to verify, not assume, that the controls exist and operate. Board packets full of green dashboards are explicitly insufficient; DORA Article 5 requires documented evidence of management body involvement in the ICT risk framework.

Read those four in order, and the question that matters becomes obvious: can a Kubernetes cluster produce that evidence, right now, without a human rewriting it first?

For almost every fintech we have talked to, the answer is no. Not because the controls are absent, but because the evidence is fragmented across Helm values, CRD specs, Prometheus retention windows, audit webhooks, and — most commonly — engineers' Slack history.

The concrete gaps

Here is what the DORA–Kubernetes intersection actually looks like, as a map of obligations to the runtime artifacts that satisfy them.

DORA obligationKubernetes evidenceTypical failure mode
Art. 9 — access managementRBAC Role/ClusterRole graph, IdP federation configEvidence is "we use Okta," not the enforced role matrix
Art. 10 — detectionAudit policy + SIEM forwarder config + retentionAudit policy is None on default managed control planes
Art. 11 — response & recoveryNamespace-scoped PDBs, backup CronJobs, restore runbooksBackup exists; restore has never been tested with evidence
Art. 24 — TLPT readinessNetwork policies, service mesh mTLS scope, egress controlsNetwork policy defined on a subset of namespaces
Art. 28 — third-party registerController/CRD inventory, operator images, source registriesNo single source of truth for what is installed cluster-wide

Each row has a remedy your platform team probably already knows. The issue is not technical; the remedies are well-understood. The issue is that assembling them into regulator-consumable evidence is a human activity that takes weeks per audit cycle and is out of date the moment it ships.

What changes when evidence capture is deterministic

Drop the runtime-to-evidence transformation onto a deterministic pipeline and the economics invert.

A single kubectl-equivalent snapshot — cluster state, audit policy, RBAC graph, admission controller config, network policy set, operator/CRD inventory — becomes the input. The pipeline maps each artifact to the DORA obligations it satisfies via a versioned, inspectable mapping. The output is a signed report that a supervisor could, in principle, re-run bit-for-bit from the same input.

Concretely, for a fintech running EKS with Istio and ArgoCD:

Snapshot t₀:
  ├─ control-plane/
  │   ├─ audit-policy.yaml                 → Art. 10 evidence: ✓ (level=RequestResponse, coverage=auth*, network*)
  │   ├─ managed-addons.json               → Art. 28 evidence: ✓ (4 managed components, each with SBOM hash)
  ├─ rbac/
  │   ├─ cluster-roles.json                → Art. 9 evidence: ⚠ (2 roles with wildcard verbs on secrets/*)
  │   └─ identity-bindings.json            → Art. 9 evidence: ✓ (all bindings federated through Okta)
  ├─ workloads/
  │   ├─ istio-mtls-policies.json          → Art. 24 evidence: ⚠ (3 namespaces with PERMISSIVE mtls mode)
  │   ├─ network-policies.json             → Art. 24 evidence: ⚠ (coverage 67% of pods; 12 without policy)
  │   └─ pod-disruption-budgets.json       → Art. 11 evidence: ✓ (all payment-path services have PDB)
  └─ supply-chain/
      ├─ operators.json                    → Art. 28 evidence: ✓ (15 operators, all with pinned digests)
      └─ image-digests.txt                 → Art. 28 evidence: ✓ (no :latest tags in production)

The finding table is not an opinion. Each is a deterministic evaluation of "coverage" against a published threshold — if the threshold changes, the mapping version changes, and the old report still verifies against the version it was produced with. A different consultancy running the same engine against the same snapshot produces the same report. The delta between two audits is a function of your cluster state, not a function of which firm you hired.

Why this is specifically a fintech problem

Two DORA properties make this harder for fintech than for, say, a retailer with a mirror Kubernetes footprint.

Concentration risk aggregation (Art. 28). DORA requires you to identify ICT concentration risk — over-reliance on a single provider, a single region, a single category of provider. In Kubernetes terms: your cluster is a concentrated provider to itself. If the CNI plugin and the CSI driver come from the same vendor's operator catalog, the supervisor wants that flagged. You cannot produce that flag from Prometheus. You produce it from a deterministic inventory of what is installed, by origin, by criticality.

Threat-led testing with integrity (Art. 24). TLPT assumes the red team can exfiltrate control-plane credentials. The blue-team evidence that DORA wants is not "the attack failed" — it is "the attack failed and here is the audit trail that proves the detection path fired within the classification window." That audit trail must survive the incident timeline, including the possibility that the attacker's first move was to silence the audit sink. Log integrity (append-only, signed, externalized) stops being a nice-to-have and becomes a primary compliance artifact — one the pipeline must inspect and report on.

The alternative you do not want

The alternative — and this is the path most fintechs are defaulting into — is annual engagements with a Big Four consultancy who will, for six figures, produce a document that approximates DORA compliance based on interviews and screenshots. This document has three problems.

First, it is stale the day it ships. A fintech deploys Kubernetes manifests hundreds of times per week. A document that describes the state of the cluster on March 3rd is a fiction by March 10th.

Second, it cannot be verified by the supervisor. The supervisor under DORA has inspection powers, and the inspection will not be "please resend the PDF." It will be "run the same check, now, against the cluster you have today, and show us the delta." A consultancy deliverable does not help you answer that question.

Third, it is not reproducible in a breach. When — not if — a DORA-classified incident happens, you will be asked to produce the control evidence as it existed at the time of the incident. A human-produced report cannot be regenerated; the humans have moved on, the notes are gone. A deterministic pipeline replays the input snapshot against the mapping version in force at the incident timestamp and produces the evidence trivially.

What to do this quarter

If you run a European fintech on Kubernetes, the engineering work to get DORA-ready is narrower than the regulation implies. In priority order:

  1. Fix the audit policy. If your managed control plane's audit policy is default, it is not DORA-compliant. level: RequestResponse on authentication, RBAC, and network-policy mutations, forwarded to an append-only sink outside the cluster. This is one YAML change and it unblocks three DORA articles.

  2. Enforce deny-by-default network policy. Every namespace. No exceptions for "legacy" — DORA does not accept legacy as a defense. Istio or Cilium both work; the tooling is less important than the coverage being 100%.

  3. Capture a verifiable cluster snapshot weekly. The snapshot is the input to every deterministic evaluation you will ever do. If you cannot export your cluster state reproducibly today, no downstream compliance posture is reproducible either.

  4. Stop treating operator installs as platform trivia. Every operator is a third-party ICT provider under Article 28. Pin digests. Record origin. Review on change. Your CI already knows how to do this for application images; extend it to the control plane.

  5. Map, don't narrate. When your compliance team asks for a DORA control narrative, resist the urge to write prose. Produce a control → evidence mapping that a pipeline can evaluate. The narrative is the output of the mapping, not the input.

None of this is speculative. DORA's technical standards — the RTS and ITS published by the ESAs — are published, binding, and specific. The gap between what they require and what a Kubernetes cluster can ambiently produce is closable in a quarter, for a team that treats it as an engineering problem rather than a governance one.

The ones that don't will be audited anyway. They will just lose more time and money doing it, and produce reports that no supervisor and no customer believes.