How to Shape OPA Data for Policy Performance

7 min read

Updated August 23, 2022
Published July 26, 2022

In Tim Hinrich’s prior blog titled the Three-Body Problem for Policy, he dives into the interconnected relationship between policy, data and software. He identifies a key consideration when using OPA — that “policies can only be evaluated when provided with the correct data.” The full blog is well worth the read to better understand the role of data and its correctness in your policy implementation. Here, in this blog, I want to explore how the shape of your data is relevant to policy performance, in addition to correctness.

OPA allows for JSON datasets to be included in the policy bundles (or directly in OPA if you are not using bundles). Oftentimes, this policy data comes from other systems, or even from outside sources, and is not optimized for use within OPA. We will explore how to identify and fix improperly shaped JSON policy datasets.

We will be using Styra DAS to manage the OPA in the following examples. If you don’t already have a Styra DAS account, you can sign up for a free account here.

First let’s set up a test case. To start, we’ll write a policy that checks to see if a user is allowed to access a resource based on zip code. For this, we will need a JSON file of zip codes. As a side note, I was surprised that I could not locate a public dataset for this. There are several paid versions (mostly XML or CSV); if you’re doing this type of validation in your organization, you likely are using a paid dataset. I just generated a representative dataset, based on my knowledge that there are roughly 40,000 zip codes in the US and distributed them based on state population — so don’t expect the dataset to be accurate. The file here is a representative set of zip codes that are organized in an array with properties for zip code and state.

Let’s get this into DAS. First create a new system by clicking on the add system button (+).

Fill out the pop-up box with the following selections:

System Type: Custom
System Name: Data shape test (or really anything you like)
Description: <optional>
Launch Quick Start: No (Quick Starts are a great way to gain an understanding of how to set up systems, but for our example we won’t be using it)

Click Create System.

From the menu on the newly created system, click Add Data Source and select HTTPS. Fill in the following fields:

Data source name: zip_codes
URL: https://github.com/kroekle/zip_generator/raw/main/zips.json
Refresh interval: 24h
(all other fields can be left blank for now)

You can click the Refresh Data button under Preview to test your settings and see the data, then click Save.

The datasource will be too large for DAS to view in its editor, but if you want to download it or see the settings you can use the buttons on the top right of the screen (when you have the data source selected).

Now let’s add a simple policy. In the rule package, click on rule.rego and add the following:

The allow rule is fairly simple; most of the work is done in the state rule. This is where we look into the zip_code data and find the state the zip code is associated with. You can test this policy by pasting the following into input and hitting preview (you may have to hit the preview once to bring up the input window):

Preview is a great tool for testing quick iterations of your rules, but when we are testing performance, we generally want to run OPA the same way we will run it in our environment. Most of the time, that means building and deploying a bundle to a running OPA instance. It’s a good thing that Styra DAS allows us to do just that — for this, be sure to first publish your policy, then click on the system and go to Settings>Install. There you will find commands for running an OPA instance locally; if you don’t already have OPA installed locally, you can do this by simply running run one of the first two commands. Then, the third command (“Download Styra configuration for OPA”) will download a configuration file that points to this specific system in Styra DAS. This configuration will set up the bundle registry (from Strya DAS) and decision logging (to Styra DAS). The forth install command (“Run OPA with Styra configuration”) will startup your local OPA instance (on some shells, you might have to prefix opa with “./” and if you already have an OPA running locally, adding the “-a :8182” switch will change the port).

With your OPA running locally, you can now issue the following cURL command:

Feel free to play around with the properties and submit other allowed and denied requests. After giving these requests a few moments to transfer and index into DAS, you can view them in the Decisions tab on your system.

As a short aside, when you first go to the Decisions tab, you will see a message telling you that you have not set up decision mapping yet. To make the decisions a little easier to read in the list, you can follow the link on the message and update the decision mapping in the next menu the following way:

Path to decision: result.allowed

Columns:
action: input.action
resource: input.resource
user: input.user

Updating the decision mappings will only affect new decisions, so you won’t see any changes to your existing decisions —but all the new ones should be easier to decipher.

Looking back at the decisions, we see some interesting behaviors: the Allows are taking an unhealthy amount of time to execute, whereas the Denies are inconsistent, with some fast and others slow. (Note: most of the time when doing API authorizations, we want the decision times to be in the low single milliseconds, though your requirements might allow for higher timings).

Let’s dig into one of the slow decisions to see what is going on. We can use the OPA profile function to do this. To do this, take the following steps:

Download the policy bundle: In Styra DAS, navigate to Deployments and click the download icon on the “Currently deployed bundle”. Put this in your working folder (it might help to give it a shorter name like bundle.tar.gz).
Save decision input: Also in Styra DAS, go to Decisions, then click on the menu icon on the decision you want to test and select Copy input. In your working directory, use your favorite editor and create a file called input.json, pasting in the copied input document.
Execute the OPA profiler: In your shell, execute the following command:
- opa eval -b bundle.tar.gz ‘data.rules.main’ -i input.json –profile –format=pretty
- https://gist.github.com/kroekle/5567dedb49883927264de142373d3d21

You should see an output similar to the following:

While the output here is pretty clear, our issue lies with line 21 in our policy, which is “data.zip_codes[i].zip_code == zip”. This line is iterating through our data source to find the zip_code. Not many languages are efficient at iterating through large arrays and rego is no different.

In order to make this rule more efficient, we need to transform the zip code data from an array and into a structure that we can directly look up by property. In this case, it seems we have two options: configure the lookup property as either state or zip code. If we choose to key by state, the data and rule could look something like the following:

This is likely to give you far better results, especially if you have a somewhat low number of elements in your arrays. In many, if not most, cases this may give you good enough rule performance. But we can do even better from a performance standpoint if we arrange our data to be keyed by zip code. Something like this (note that the tradeoff is that our data file will be larger, due to having to duplicate the state code):

So now that we know what our data structure should look like, how do we get it into that form? If we are working with an internal service that is just used for this one rule, then maybe we would just modify the source service. But in many instances that may not be the case, not even desirable. Styra DAS has the built-in ability to transform data sources when we call the source service. To do this, just add a rego policy that performs the transformation, then update the data source to use the new policy.

First let’s add a new policy. In your system in Styra DAS use the menu and select Add Policy, then use the following properties:

Path: transform/zip
Module name: zip.rego

In that policy, add the following:

The input in this case is the source data from the service, so this rule will iterate through the entire input and create a set of zip:state objects. You can now publish this policy.

Now, let’s modify the data source to add this transformation. When you’re viewing the data source, click on the settings in the upper right. At the bottom of the setting panel, open Advanced and set the following:

Data transform: Custom
Policy: transform/zip/zip.rego
Rego query: data.transform.zip.zips (this represents the package.rule we want to execute)

You can hit the Refresh Data button on the right to validate your settings. After a few seconds, you should see two tabs, the first being the original source, the second being the transformed data. If the transformed data is empty, go back and check your settings or make sure you published the transformation policy. After everything looks right, go ahead and Save.

Now, let’s update the policy to work with this new data structure. All we need to do is to modify our state helper rule to look up by zip. That should look something like this:

Whenever you make a change to a policy, it’s good practice to validate the change before publishing it. On the top right hit the Validate button. After a few seconds, the the Decisions tab at the bottom should return results telling you that it was able to replay x decisions and that none have changed. If you have changed decisions here, then you have something wrong with either data transformation or the rule update. Find and fix that before you move on. When you have no changed decisions, Publish both the new policy as well as the newly transformed data source. You can now go back to your local environment and re-issue the curl commands. You should now see something similar to this:

The decisions are now far more consistently performant. Just to be sure, running through the opa profiler should give us results like this:

Now the zip code lookup (line 20 in my policy) is the 5th slowest call at about 8 microseconds, which I suspect will be fast enough for most any use case.

In this blog, we were able to: set up an example of a rule that had undesirable performance characteristics due to how the data was shaped.; diagnose what part of the rule that was causing the problem; and, finally, we were able to use Styra DAS transformations to reshape the data source, giving us much more desirable performance characteristics.

For more information on optimizing Styra DAS for your use case, visit our documentation, or spend time in the Styra Academy to hone your skills with Rego policy building and learn new use cases for OPA.

K8s Compliance

Authorization in Microservices

How to Secure Kubernetes Cluster

Coarse Grained vs Fine Grained Access Control

Kubernetes Security Checklist