Beyond OPA Gatekeeper: Enterprise-scale Admission Control for Kubernetes

9 min read

TLDR

OPA Gatekeeper is the most popular solution for enforcing admission control policies on Kubernetes clusters. It was designed for policy management on a single cluster. Styra DAS (built by the creators of OPA) aims to provide the next step for enterprise companies with centralized policy management over tens or hundreds of clusters and policy use cases beyond Kubernetes. In this post, we explain how Styra DAS differs from OPA Gatekeeper and how our enterprise focus led to different design decisions.

What is OPA Gatekeeper?

Developed by the Open Policy Agent (OPA) community, OPA Gatekeeper is a great option for open source policy management on a single Kubernetes cluster, with features like:

  • Policy enforcement: Gatekeeper can enforce policies that ensure that your cluster is consistently and securely configured.
  • Pre-built policy library: Gatekeeper comes with a library of pre-built constraint templates that you can use out of the box.

An OPA Gatekeeper installation manages policies on Kubernetes using Rego code wrapped in custom Kubernetes resources (called ConstraintTemplates and Constraints).

What is Styra DAS?

Styra Declarative Authorization Service (DAS) is a SaaS policy authoring, management and monitoring platform developed by the creators of OPA. Styra DAS lets platform engineering and cloud infrastructure teams extend OPA policy as code across public cloud configurations, managed or native Kubernetes deployments, infrastructure-as-code tools like Terraform and the interconnected service mesh deployments that control how modern application services communicate. 

Enterprise scale for Kubernetes

As with other aspects of computing, the requirements for policy management change rapidly with growing complexity and scale. The design of OPA Gatekeeper makes it a great choice for small organizations with a handful of clusters to manage, but it requires more and more custom plumbing to function beyond that scale. We designed Styra DAS to be a single control plane for OPA policies for large organizations. As a comprehensive platform for managing OPA instances, it includes an authoring environment for creating and editing Rego policies, pre-built Policy Packs and policy libraries to help accelerate policy deployment, a monitoring system to ensure that correct policies are deployed to all running OPAs, an easy way to audit decisions and more. One of the key benefits of Styra DAS is its ability to support any number of clusters in which OPA is deployed as an admission controller. This makes it an ideal solution for organizations with large, complex Kubernetes deployments that need to enforce policies (and different sets of policies) across multiple clusters.

A new cluster can be added to Styra DAS by deploying OPA and the Styra Local Plane (SLP) to the cluster using the installation instructions provided by Styra DAS. The SLP is responsible for loading policies from Styra DAS, while OPA itself uses those policies to perform admission control.

Meanwhile, teams use the OPA Bundle API to pull Rego policies from Styra DAS. This method of deploying policies is different from Gatekeeper’s, which relies on Custom Resources to store Constraints and Constraint Templates on the cluster itself. Using the Bundle API makes it much easier to deploy policies from a single source, without needing to access the clusters’ API servers. The architecture below illustrates how OPA and Styra DAS integrate for Kubernetes admission control. To understand how this approach to deploying policy as code works with GitOps, you can read our blog post on the topic.

Multi-cloud support

Often, users and platform teams have questions about how different admission controllers can support different cloud environments. In some cases, OPA Gatekeeper has been given native support within individual cloud providers; for instance Gatekeeper forms the basis of Google Anthos Policy Controller, which includes some dashboard functionality. While this can benefit users leveraging Google Kubernetes Engine (GKE), one drawback of this approach, for users, is that it can create vendor lock-in when Kubernetes deployments often extend beyond one provider. 

The Styra DAS model of managing policies for Kubernetes clusters makes it a natural fit for organizations needing multi-cloud support. Styra DAS is agnostic about what cloud a cluster is hosted on, or whether the clusters are on the same cloud. It pulls all logs and compliance data into a single control plane, providing singular visibility into policies applied across all clusters and cloud providers.

Multi-cluster policy management

Managing admission control policies across a large number of clusters requires a framework that helps handle the differences and similarities between clusters in an elegant way to minimize misconfigurations. 

Styra DAS represents each cluster as a System. Each System has its own set of policies but can also receive policies from Stacks. Stacks are collections of policies that can be enforced over multiple Systems (clusters). For example a typical organization might have clusters for development and production environments and would create Stacks to apply the necessary policies to each:


In this configuration the Systems themselves might not even need any policies and would just deploy the ones inherited from the PROD, DEV and Base stacks.

The Base stack would include policies that are to be enforced on every cluster like:

  • Enforce a read-only file system
  • Disallow running containers in privileged mode

The DEV stack would be pretty permissive but would still add a few policies like:

  • Force each Deployment to specify an owner label on both the Pod and Deployment level
  • Force the source and source_sha annotations to be present on containers

The PROD stack would also define policies like:

  • Images can only be pulled from the organizations private registry
  • Ingress hosts must be from on an allow list
  • Only the on-call administrator has access to the cluster

Impact analysis (What-if scenarios)

Changing policy on a single cluster can be risky. But what if we want to change one of the Stacks in the above picture? It is quite difficult to foresee the full impact of a policy change on all the clusters the Stack is affecting. Styra DAS helps to solve this problem by collecting decision logs from the OPAs performing admission control and uses those logs to determine how large of an effect a policy change will result in. Once draft policy changes are ready, the policy author can ask Styra DAS to replay old decision inputs against draft policies and report on those that produce different results than before.

While Gatekeeper provides a dryrun enforcement action that allows users to deploy policies (constraints) on a single cluster to see which resources will be in violation without actually enforcing deny decisions, this is not based on historical data and requires live testing. Moreover, given that dryrun constraints are policies applied to individual clusters, there is no real way to aggregate and assess the impact of global policy changes across many clusters, such as with Stacks. In Styra DAS,  dryrun is analogous to the Monitor feature, without a dashboard view or the ability to monitor policies across clusters.  

In the example below within Styra DAS, a policy change would cause 61 out of 170 replayed admission control decisions to result in a denied outcome instead of being allowed. This shows the policy author that this change will have a major impact on the operation of the cluster and care should be taken before publishing the changes.

Shift left — CI/CD

Impact analysis combined with unit testing enables Styra DAS users to build powerful CI/CD pipelines for their policies. Styra DAS supports deploying policy from a Git repository (and pushing changes) and also exposes the testing functionality over an API. This means cluster administrators can build pipelines in their preferred tool to verify and approve policy changes.

Keeping clusters compliant

Kubernetes admission control by itself cannot make sure that only compliant workloads are running on the cluster. Non-compliant workloads may still be running when stricter policy rules are applied to a cluster after those resources originated, or or because the cluster is being onboarded to admission control, and so the existing workloads have not been verified. 

For example, the security team might decide to only allow images from the organization’s internal registry to be deployed. We can easily add the right policy rule, but that will not evict the already-running containers whose images are now non-compliant.

Gatekeeper addresses this issue with its Audit feature. When the audit runs, it adds any rule violations to the status field of the particular Constraint. Users also have the option to see the total number of violations in a cluster with Prometheus metrics, or to export JSON-formatted audit logs. One challenge with Gatekeeper auditing using Constraint Status is that there is a limit to the number of violations that can be reported, given the cap on how large Kubernetes API objects can grow — exceed that limit and violations are not reported. Because of this, auditing a large number of resources on large clusters can pose a challenge. Moreover, this doesn’t provide an easy way for a cluster administrator to view all violations across different clusters in an environment.

Because Styra DAS is a centralized service receiving data from all deployed OPAs, it has a holistic view over all clusters. It provides the Compliance View feature which lists all non-compliant resources together with the reason for their policy violation:

Context-aware policies

Admission control policies will often require data about the business context or some configuration. Here a few examples for such policies:

  • Only approved users or groups can apply applications to certain namespaces. That list of approved users or groups might be kept in a specialized internal application with a UI to manage them and an API to fetch them.
  • Changes to production clusters are only allowed for the current on-call administrator. This information would be tracked in PagerDuty.

Both Styra DAS and Gatekeeper support loading external data, but the approaches differ. OPA Gatekeeper adds a built-in function to Rego (external_data) that connects to an external data provider during policy evaluation. This mechanism is meant to be used with services running inside the cluster, as it requires mutual TLS authentication. This mechanism can also be used to validate image signatures using cosign, by running your own cosign provider on the cluster. Gatekeeper can then talk to the provider, which in turn can connect to the Sigstore service.

Styra DAS supports data-driven policies by importing data into DAS itself and bundling that data with the Rego policy files before sending it to the OPA instances. Moreover, it supports transforming the data while loading it from the external system. This gather first, distribute later approach works better with corporate systems like Jira, Github or Pagerduty because it doesn’t require an on-cluster proxy/provider to be deployed.

This model also allows OPA to always be available in the face of connection issues to Styra DAS or the data provider system. All required policies and related data are cached locally and are therefore quickly accessible for evaluation. The downside of this approach is that the data has to make a roundtrip through Styra DAS, so it can take up to a few minutes to be updated in all OPA instances.

Policy libraries and Compliance Packs

Fortunately, no organization has to start building Kubernetes admission control policies from scratch. 

OPA Gatekeeper offers a basic library of about 40 policies which come in handy when getting started or working to replace PodSecurityPolicies with Rego rules. As Gatekeeper uses the ConstraintTemplate custom resource to deploy policies from both the built-in and your own custom library, users are advised to use a separate tool, Kustomize, to combine them during deployment.

Styra DAS strives to provide and maintain an extensive library of Kubernetes policies (~130 at the time of publishing) that are grouped into several Compliance Packs. This allows cluster administrators to focus on a particular set of policies their organization is required to implement. The current set of Compliance packs consists of:

  • CIS Benchmarks
  • NIST Container Security
  • MITRE ATT&CK
  • PCI DSS v3.2
  • Pod Security v2
  • Best Practices

This is a screenshot of Styra DAS with multiple compliance packs enabled:

OPA Gatekeeper-to-Styra migration

Let’s say your organization has been running with Gatekeeper on a few clusters but you foresee further growth and scaling issues in the future. What is the migration strategy from Gatekeeper to Styra DAS? Both Gatekeeper and Styra DAS use OPA under the hood and both work with the same Kubernetes admission control API. These similarities make a migration relatively straightforward, however some key differences remain, especially in the structuring of Rego policies.

The rough outline of a migration path would be the following:

  • Create a Styra DAS system for each cluster.
  • Install the Styra DAS admission control package (OPA, SLP, Datasources agent) on each cluster.
  • Create a Stack in Styra DAS to hold policies which apply to each cluster.
  • If using Gatekeeper Policy Library Constraints, add corresponding built-in policies from the Styra DAS policy library to the Stack.
  • For any customer-built ConstraintTemplates and Constraints, write custom policies in Styra DAS. This code may require manual tweaking to get it working, but this should not be overly difficult for someone familiar with Rego.

ConstraintTemplates and Constraints

The most challenging part of migrating from Gatekeeper to Styra DAS is Gatekeeper’s separation of concerns between ConstraintTemplates and Constraints. While ConstraintTemplates contain the actual Rego code, Constraints are resources a cluster administrator can create to actually apply the code in the template with specific parameters. This way, cluster administrators can apply policies to the cluster without understanding Rego. 

Styra DAS achieves similar functionality by providing the Swimlane view shown in the Compliance Packs chapter. While built-in policies are manageable using a user-friendly UI, there is no way to define organization-specific rules in a similar way — those have to be added to the Rego code. This functionality is however coming soon in the form of Custom Snippets, which will allow the definition of organization-specific parameterized policies that show up in the Policy library the same way as built-in policies do now (stay tuned!).

Authorize everything

Finally, Styra DAS fully leverages OPA’s versatility when it comes to different authorization scenarios and use cases. For instance, Styra DAS supports the validation of Terraform configurations and CloudFormation stacks at resource creation, update and deletion (the latter recently announced in Beta).

On the API and application side, Styra DAS integrates with service mesh technologies like Istio, Kong Mesh or Kuma, or directly with the application with our Entitlements System Type. 

Support for both infrastructure and application authorization is a unique feature of Styra DAS and allows organizations to monitor and manage all policies from a single pane of glass. Even though the policies may vary widely across use cases, this provides a significant advantage by using a single, extensible technology stack for both.

Wrap up and next steps

Gatekeeper and Styra DAS can both fulfill the role of an admission controller for Kubernetes, as both are based on OPA. While Gatekeeper is focused on managing policies on a single cluster, Styra DAS was designed with a wider scope. It acts as the central control plane for a large number of clusters and can handle application authorization as well.  

If you’d like to learn more about OPA Gatekeeper and Styra DAS, especially as it relates to your specific environment, feel free to book some time with a Styra team member today. 

FAQs

When to switch from OPA Gatekeeper to Styra DAS?

We recommend migrating from Gatekeeper for Styra DAS when:

  • You manage 10+ Kubernetes clusters.
  • You need central visibility into policy deployment and compliance state over all your cluster (and don’t want to build it yourself).
  • You want to use Rego policies for other use cases outside Kubernetes (Terraform, CloudFormation, Service mesh communication and more).
  • Using data from 3rd party systems in your policies.

What’s the relationship between Styra and OPA Gatekeeper?

Styra are the creators and maintainers of Open Policy Agent (OPA). While Styra maintains OPA itself, engineers from Microsoft and Google created (and maintain) OPA Gatekeeper to provide an open-source integration with Kubernetes. OPA Gatekeeper is considered to be part of the OPA project.

Cloud native
Authorization

Entitlement Explosion Repair

Join Styra and PACLabs on April 11 for a webinar exploring how organizations are using Policy as Code for smarter Access Control.

Speak with an Engineer

Request time with our team to talk about how you can modernize your access management.