← Back to Insights

What a Good Kubernetes Security Decision Record Looks Like

Security decisions made in meetings and forgotten in Slack threads become invisible technical debt. A decision record makes policy choices durable, explainable, and maintainable. Here is what one actually looks like.

Six months after a Kubernetes security policy is implemented, someone will ask why it was done that way.

Maybe it is a new engineer trying to understand the exception model. Maybe it is a developer whose workload is blocked by a control they think is wrong. Maybe it is a manager asking why the baseline allows X but not Y. Maybe it is the platform engineer who implemented it, who cannot quite remember all the reasoning that went into the decision.

In most engineering organizations, the answer to “why did we make this decision?” is somewhere in a meeting recording nobody has watched, a Slack thread from last year, or an email chain that the relevant people have mostly been removed from.

This is not a documentation problem. It is a governance problem. Security decisions that cannot be explained are security decisions that cannot be trusted, challenged, or improved. They accumulate as invisible technical debt — policies that exist for reasons nobody knows, controls that seem arbitrary because the rationale was never written down.

A decision record fixes this. Not by adding bureaucracy, but by making a small investment in writing down the reasoning at the time decisions are made — when the reasoning is available and the context is clear.


What a decision record is not

Before describing what a good Kubernetes security decision record looks like, it is worth being clear about what it is not.

It is not an Architecture Decision Record (ADR) in the traditional sense, though it borrows from that format. ADRs are often associated with heavyweight processes — formal templates, approval workflows, governance boards. The overhead puts teams off, and the result is either no records at all or records that are technically present but not maintained.

It is not a policy document. A policy document describes what the rules are. A decision record describes why the rules are what they are. These are different things, and conflating them produces documents that are long and read by nobody.

It is not a compliance artifact. A decision record is written for the engineering team, to serve the engineering team. If it also satisfies a compliance auditor, that is a side benefit. Optimizing it for compliance language will make it less useful for its primary purpose.

A good decision record is a short, readable document that answers the question “why did we make this choice?” in enough detail that someone who was not in the room can understand the reasoning, understand what alternatives were considered, and understand what would need to change for the decision to be revisited.


The fields that actually matter

A Kubernetes security decision record does not need to be long. The following six fields cover the reasoning that matters:

Decision. One or two sentences stating what was decided. Specific and concrete — not “we decided to improve our policy posture” but “we decided to enforce Pod Security Standards at the Baseline level across all application namespaces, with the exception of the infrastructure namespace.”

Context. Why this decision was on the table at this moment. What triggered it — a compliance requirement, a security review finding, a specific incident, a team maturity milestone. The context is important because decisions made in response to a compliance deadline look different from decisions made after a security incident, and the reasoning that applies to each is different.

The options considered. What alternatives were evaluated. For a policy tool decision, this might be Kyverno, OPA Gatekeeper, and built-in ValidatingAdmissionPolicy. For a baseline level decision, this might be PSA Baseline versus PSA Restricted versus a custom policy set. The point is not to document every option that exists, but to document the options that were genuinely evaluated and why they were or were not chosen.

The decision rationale. Why the chosen option was selected over the alternatives. This is the part that most teams skip and most need. The rationale should be specific — not “Kyverno is easier to use” but “Kyverno’s YAML-based policy syntax fits our team’s existing skill set, and the PolicyException resource gives us the structured exception model we need without requiring Rego knowledge.”

Consequences. What the decision implies going forward. Developer changes required. Operational changes. Things that become easier. Things that become harder. Dependencies created. The consequences section is where you document the tradeoffs you accepted when making this choice.

Review conditions. What would cause this decision to be revisited. A specific Kubernetes version change, a change in team size, a compliance requirement, a significant shift in threat model. This is the field that transforms a decision record from a historical artifact into a living document — one that gets revisited when the conditions it was made under change.


A concrete example

This is what a real decision record might look like for a team choosing their admission control approach.


Decision: Use Kyverno for admission control policy enforcement. Enforce PSA Baseline at the namespace level for initial rollout.

Context: Following the Q1 security review, we identified that we have no consistent enforcement of baseline security controls across application workloads. Kubernetes 1.30 GA of ValidatingAdmissionPolicy prompted us to evaluate the current tooling landscape before committing to any webhook-based solution.

Options considered:

  • OPA Gatekeeper with Rego policies. Strong community adoption, expressive policy language. Rejected because Rego has a significant learning curve that our current platform team (two engineers) cannot support effectively. Operational overhead of managing Gatekeeper in production was also a concern.
  • Built-in ValidatingAdmissionPolicy (CEL). No external webhook dependency, lower operational overhead. Currently limited in expressiveness for mutation and complex multi-resource policies. Also lacks a structured exception model out of the box. Considered as a complement to Kyverno for simple controls.
  • Kyverno with PolicyException resources. YAML-based policies that fit existing team skills. PolicyException provides a structured exception model. Strong integration with PSA and compatibility with CEL in recent versions. Selected.

Decision rationale: Kyverno’s YAML syntax means the platform team and senior developers can read and write policies without learning a new language. The PolicyException resource directly addresses our need for a traceable exception model — one of the gaps identified in the Q1 review. The operational model (admission webhook plus background scanner) is well documented and manageable for a two-person platform team. We are not currently enforcing complex cross-resource policies that would require Rego’s expressiveness.

Consequences: Kyverno needs to be maintained and upgraded alongside Kubernetes. We accept the webhook dependency. Developer workflows require updating to handle PolicyException requests — exception documentation required before exception approval. Platform team needs to monitor Kyverno compatibility with Kubernetes minor version upgrades.

Review conditions: Revisit if: team grows to a point where Rego expertise becomes viable and policy complexity justifies it; Kyverno fails a compatibility requirement with a required Kubernetes version; ValidatingAdmissionPolicy adds structured exception support and mutation capabilities that reduce the gap.


This record is readable in three minutes. It captures everything a new platform engineer, a developer challenging the exception model, or a manager asking about tooling choices needs to understand. It does not require context from the meeting where the decision was made.


One record per significant decision, not one record per policy

A common question is how granular decision records should be. The answer: one record per significant decision point, not one per policy rule.

A decision about which admission control tool to use is a significant decision. One record.

A decision about enforcing PSA Baseline versus PSA Restricted is a significant decision. One record.

A decision about whether a specific workload gets an exception is a significant decision. One record per exception.

A decision about the specific CEL expression in a ValidatingAdmissionPolicy is not a significant decision. That belongs in code comments, not a decision record.

The test is whether the decision is likely to be questioned, revisited, or needed to explain future choices. If yes, it deserves a record. If no, it belongs in documentation closer to the implementation.


Where decision records live

Decision records need to live somewhere accessible and durable. A few practical options:

The same repository as the policy files. Keeps decision records alongside the artifacts they explain. Engineers who are looking at a policy file can find the reasoning without context switching. Version control means the decision record’s history is tied to the policy’s history.

A dedicated decisions/ directory in the platform repository. Works well if policies live in multiple repositories. Centralizes the reasoning in one place. The tradeoff is a slightly higher barrier to finding a specific record if the person looking knows a policy but not where decisions are stored.

A Notion or Confluence page per engagement or policy cycle. Works if your team actually uses your wiki. Breaks down if the wiki is inconsistent or if policies and decisions get out of sync. Requires discipline to maintain — and the realistic assessment is that most teams do not have that discipline consistently.

The format matters less than the consistency. A markdown file in a git repository that gets updated is better than a Notion template that gets filled in once and never touched again.


The discipline of writing decisions at decision time

The hardest part of decision records is not the format. It is the timing.

Writing a decision record six months after the decision is mostly archaeology. You are trying to reconstruct reasoning from memory, and memory is unreliable. The nuances of why you chose one option over another — the specific failure mode of the alternative you rejected, the constraint that made one approach more practical — fade quickly.

Writing the record at the time of the decision is fast and accurate. The context is present. The reasoning is clear. The consequences are visible. The whole record can be written in thirty minutes at the moment a decision is made, and that investment pays dividends for the lifetime of the system.

The practical way to make this happen is to build it into the decision process itself. When a significant security decision is made — in a meeting, in a design review, in a Slack thread — the output is not just the decision. It is the decision plus a brief record. The record does not need to be polished. It needs to be accurate and complete.

Teams that do this consistently find that their security posture becomes much more stable over time. Not because the decisions are better — the decisions are similar. But because the reasoning is accessible, decisions get challenged when conditions change rather than when they cause a visible problem. Exceptions get reviewed because there is a review date, not because someone complained. The baseline evolves through deliberate revision rather than through accumulation of invisible edge cases.

That is the real value of a decision record. Not the documentation for its own sake, but the organizational clarity that comes from knowing, at any point in time, why your security system is built the way it is.


ClarifyIntel builds Kubernetes security systems that are documented, explainable, and maintainable — including structured decision records as part of every engagement. If your team is trying to bring more clarity to your policy decisions, send us a note.

Not sure where your team actually stands? Start with a Baseline Review.

The Baseline Review gives you a clear picture of your current posture, what matters most, and which product should come next — before you commit to enforcement or new tooling. Delivered async in 5–7 business days.