With a history spanning more than a decade, AWS CloudFormation has been the tool of choice for many organizations moving their cloud deployments from “point and click” configuration and towards managing infrastructure as code (IaC). As a mature technology, CloudFormation has spawned an ecosystem of tools, documentation and examples around the stack — whatever one is trying to accomplish in this space, chances are good they’ll find relevant resources on the topic.
Infrastructure managed as code provides opportunities as well as risks. While most, or all, of the traditional “-as-code” benefits apply, here — version control, code review and static analysis tools — what particularly stands out with IaC is its ability to enable infrastructure deployments at scale. Indeed, the ability to provision e.g. hundreds of servers with a single “Git push” allows workflows — and opportunities for automation — that previously would have been unthinkable. But this power also comes with a new set of challenges. Any cloud infrastructure comes with a cost — literally, an AWS bill — and at scale, a simple mistake in the code to provision cloud resources could end up being prohibitively expensive. Additionally, misconfiguration of infrastructure resources could leave a cloud environment vulnerable to attacks, or even accidental access to resources that would normally require a user to be authorized.
In order to mitigate these risks, cloud-native systems commonly allow policy to determine whether a resource fulfills the requirements needed for a secure deployment, prior to the resource being provisioned. One example that’s likely familiar to the Open Policy Agent (OPA) community is the Kubernetes API and its use of admission controllers, where external services like OPA may be consulted in order to determine the fate of a resource about to be created, modified or deleted. On the infrastructure side, Terraform allows platform teams to enforce policy on planned changes to infrastructure, before those changes are applied in production. While there’s been no shortage of tools to validate resources provisioned via AWS CloudFormation, these tools have commonly worked in two ways: either directly on the files describing the resources or on data describing already deployed infrastructure and configuration. They have not, however, been an integrated part of the provisioning process — leaving the enforcement of these policies ‘out of scope’ for AWS CloudFormation.
With the introduction of AWS CloudFormation Hooks, this has finally changed! Similar to how Kubernetes’ admission controller reviews resources prior to being persisted, CloudFormation Hooks allow custom code to intercept a resource on its way to deployment and verify its properties against policy. In other words, Hooks enable proactive compliance enforcement for IaC — and allow users to leverage OPA and AWS CloudFormation together. Hooks, fundamentally, are custom code running inside of an AWS Lambda function. Currently, Python and Java are the programming languages supported; we may use either to forward resources presented to the hook to an OPA running outside of it, and have the decision returned enforced by the hook.
Before we dive into the details on how OPA may be used for CloudFormation policy enforcement, let’s take a look at the components involved in provisioning infrastructure with CloudFormation.
Templates and stacks
The first, and perhaps most important component we’ll deal with is CloudFormation Templates. Templates are YAML or JSON files that declaratively describe the desired state of cloud infrastructure resources. A template may contain one or more resources, like S3 buckets, Identity and access management (IAM) policies, or EC2 instances. In addition to plain values like strings and integers, templates may also contain simple logic embedded in the document. This allows template authors to inject dynamic values through parameters, or even use helper functions in order to deal with data from the environment, or references to other parts of a template. For those who prefer not to edit YAML or JSON directly, tools like the AWS Cloud Development Kit (CDK) allows using familiar programming languages like Python or Java to build, test and eventually deploy the templates.
Resources described in a template together constitute a stack. Stacks are managed as a single unit, and all resources in a stack are either successfully deployed, or none of them are.
Hooks allow us to intercept the stack creation process, and either allow the procedure or deny it, based on the policy enforced by the hook. We might for example wish to ensure that any S3 bucket provisioned in a stack has bucket encryption enabled, that EC2 instances are only reachable from the internal company network, or that any resource created is labeled with tags naming the team responsible for the resource, or the name of the department that should foot the bill for the cost of the cloud resource. If you have worked with OPA in the past, I’m sure this sounds familiar!
A hook is simply an AWS Lambda function executing some code on the input provided by CloudFormation. Worth noting here is that the input to the function is not the entire stack about to be deployed, but each resource in the stack provided in isolation. Deploying a stack may hence trigger any number of hooks, depending on the number of resources contained in the stack.
One small caveat though — hooks are only triggered for the resource types for which they have been configured! This allows developers to build and publish hooks for a particular action (i.e. create, update or delete) and resource type, and consumers of these hooks to install only the hooks applicable to the resources expected to be deployed to their clusters. Since we’ll want to use OPA for policy decisions, it should be up to the OPA policy to decide whether a resource type is of interest or not rather than some predefined configuration. Additionally, what’s considered interesting may of course change over time, and having to update the configuration of a hook every time a policy is updated would be arduous. Since there’s currently no “wildcard” option provided when specifying resource types in a hook configuration, our best option is simply to list them all explicitly.
With a hook configured to process any resource type, we’ll need some code to run inside of the Lambda function triggered by the hook. The responsibilities of this code would be rather simple — forward any incoming resource document to OPA, and have OPA return a decision based on the policy and data it has been provided. If the decision is to allow the action, the hook returns a response indicating success, and if the decision is to deny the action, the hook indicates failure, effectively stopping the stack creation process. Additionally, we might want to log the result of the decision to AWS CloudWatch, in case we need to investigate why a stack failed to deploy.
The OPA AWS CloudFormation Hook
If adding almost 700 (!) resource types to a configuration file sounds like something you’d rather avoid, or you’d prefer to focus on policy authoring instead of writing plumbing code required for the hook, you’ve probably prioritized correctly. To allow you to think about policy rather than configuration or deployment, the OPA AWS CloudFormation Hook (or the “OPA hook”) was recently released!
The OPA hook includes all the configuration and code needed to quickly install a CloudFormation Hook that delegates infrastructure provisioning decisions to OPA. The only thing you currently will need to provide yourself is a URL pointing to the OPA responsible for making the decisions. The repository also provides examples of both templates and policy, along with tools for local development and testing that you may find useful.
To get started quickly, check out the OPA documentation, which now includes a tutorial showing how to install and use the hook in your environment, as well as examples of what data and policy in this environment might look like. You’ll also learn how useful patterns like dynamic policy composition may be used to group your policies by the resource type for which they apply. Since the OPA hook is a new addition to the ecosystem — and using a new feature in the AWS ecosystem — we’d love to know your experiences working with both. If you have ideas, feature requests, or run into issues while using the OPA hook, please help us make it better by submitting an issue, or make your voice heard in the OPA Slack or discussion board!
While being an early adopter is exciting — it commonly comes with a “cost” as well — and given the recentness of the AWS CloudFormation Hook feature, it’s not surprising that a few things in our integration proved a bit rough around the edges. As a user of the OPA hook, you won’t need to worry much about these; they are handled by the hook. Some notes from the experience building this integration might still be useful, if not just to learn more about the feature as a whole.
Some of the “lessons” we’ve had to learn along the way aren’t necessarily faults in the implementation of the hook feature at all, but simply designed in a way that might not have taken the OPA use case into account. As an early adopter, we can ease the burden of both those following after, and potentially improve something long term, by helping to pinpoint the problems we’ve identified and the workarounds you’ve employed to overcome them.
Some things that could improve the integration in the future include:
As was previously mentioned, the hook schema model — where each resource type a hook handler applies to must be provided explicitly beforehand — isn’t a great fit for an external policy engine, where you’ll want to make decisions like that in policy and not in static configuration. Having the option to use wildcards in resource type names, like “AWS::S3::*” or even just “AWS::*” would be a great improvement for this use case.
Run OPA inside of the hook. If Go is made a supported language for CloudFormation Hooks in the future, we could simply call OPA via the Go SDK directly from inside of the hook. While the latency overhead of calling an external service is a non-issue in this context, having to run another component outside of the lambda adds to the overall maintenance burden.
No special handling of boolean values. Boolean attributes are everywhere in CloudFormation templates, so it was particularly surprising to see policies written to deal with these fail in unexpected ways. This proved to be caused by the hook, which somewhere along the way from template to hook input converts boolean values to strings, i.e. “true” and “false”. Not just surprising, but arguably dangerous, as many languages, including Python and Rego, commonly check for “truthy” values rather than true and false explicitly, and the string “false”, just like any string, is considered to be true when converted to boolean. When even the AWS CloudFormation sample hooks fail to account for this, we have a problem which I think must be addressed at its root, i.e. the conversion from boolean values to strings being done in the first place.
In order to publish a hook to the AWS Marketplace, it needs to pass a series of “contract tests”, which given a number of input files assert an expected outcome. This is great, but it’s currently not clear how this should be achieved when the outcome of the hook depends not only on input data but also on an external component like OPA. Even if we may run OPA embedded in the lambda at a later point in time, we’ll likely still want to provide the policy for decisions from an external source like an S3 bucket, and would need to be able to provide this as part of the test configuration as well.
On the OPA side, some policies would benefit from being able to query the AWS API directly for the state of resources currently deployed — for example, to ensure there aren’t resources deployed with configuration that would conflict with existing deployments, or to limit the number of certain resources deployed in an environment. This could be done using the http.send built-in function, but we’d need to support the AWS Signature Version 4 for authenticating these requests. There’s an open issue in the OPA project to address that, and hopefully we’ll see this added soon!
Getting to apply OPA to an entirely new domain or technology is almost always a rewarding experience, and while new integrations seem to be added almost weekly to the OPA ecosystem, it’s not every day you’ll see such an established platform or product like AWS CloudFormation opened up for policy enforcement. A great opportunity both for OPA and AWS CloudFormation users, and likely one that will see many interesting applications, tools and policy libraries crop up in the future.
While there’s still a few rough edges here and there, the first impression of AWS CloudFormation Hooks for the purpose of integrating OPA is overwhelmingly positive. We’re excited to see OPA used for AWS CloudFormation Hooks, and can’t wait to see how people use it. A big part of the world of IaC just got better, and a big part of the world of policy as code just got more interesting!