Security tiers considered harmful
tl;dr:
- don’t define security tiers; use security cells instead
- each service should have its own security cell
One time at JOB_1, we had a problem: we were about to start serving a dataset that was significantly more sensitive than our typical dataset. Leaking it would likely produce scary consequences in the real world, with ripple effects beyond just the company and its shareholders. We hadn’t dealt with issues of that kind before; and as engineers on the project, my colleagues and I felt duty-bound to find the “right” solution. We started looking for other places where high security data was stored, and we found one owned by another team in the org. Perfect! We would store our dataset alongside the existing dataset, and they’d both be safe together in the protective cocoon of the high security environment.
This didn’t work. The folks who owned the existing dataset were rightly distrustful of sharing their meticulously constructed environment with our team. And for good reason: adding our application to their environment would dramatically change their audit requirements, potentially open up their app to traffic from our callers, and result in members of our team needing to do a bunch of annoying training and background checks that didn’t contribute to getting our jobs done. By the time the dust had settled, everyone was very convinced that glomming the two applications together into a “high security” tier would be a bad idea.
Another time at JOB_2, we had just acquired ACQUISITION and we wanted to start taking advantage of some wonderful and lucrative synergy between their services and our services. The only problem was that everyone agreed that ACQUISITION’s security was bonkers-awful. We didn’t feel comfortable opening up JOB_2’s network to allow them to call our services, which was a requisite if we wanted to get anything done. There was a lot of hemming and hawing within the Security team, and ultimately it was decided to invest a bunch of engineering effort into poking holes in our network perimeter to allow traffic to flow between them and us.
In both of these cases the problem was that—simply by being deployed in the same computing environment—the different services had the ability to harm each other. It was implicit in everyone’s mind that “same computing environment” meant that they shared resources and capabilities. We were operating in a model of security tiers. In both cases, there was an existing “High Security” tier, and a new entrant (my team in the case of JOB_1, and the acquisition in the case of JOB_2) arrived with a proposal to unleash horrors against the soft underbelly of our network.
The solution is: don’t have a soft underbelly. Define boundaries around individual services, rather than groups of services. I have heard proposals to define boundaries at larger scales, like around organizational subunits. This is dangerous: your carefully constructed org-chart-shaped infrastructure will look pear-shaped after the next reorg, and simply unrecognizable after three reorgs. (This is because it is significantly easier for managers and executives to drag boxes around an org chart than it is for infrastructure engineers to move cloud resources around without breaking everything).
Ultimately, you should do what any public cloud provider does: assume that your environment hosts wolves, goats, and cabbages, and that it is your job to host each of them while endangering none of them. I have heard security engineers use terms like “zero trust” and “assume breach” to describe these concepts, and those terms are great. I somewhat prefer terms like “cells” or “compartments” or “sandboxes” or “partitions”, but ultimately it’s best to use whatever gets the message across.
Beyond terminology, my preferred approach is to ask questions and see how much they make your colleagues squirm:
- If you acquire a new company, would you feel uncomfortable setting their engineers loose in your cloud on Day 1? What about bridging their network with your network?
- Do you need to do a security audit of third party code before deploying it on your infrastructure? If so, why? What might happen, and why can’t the infrastructure defend itself?
- If I wanted to deploy a new “high security” service today, how much more work would it be than deploying a “regular” service? If you were the lead engineer on a high security service, would you have the impulse to go build isolation from scratch rather than trusting what exists already?
- Would you be afraid to provision a brand-new service on your PaaS and hand the deploy keys over to an attacker? (Let’s assume that we don’t care about denial-of-wallet attacks: the attacker pays for whatever resources they provision themselves)
All four of these are basically different phrasings of the same question, but I think they’re still useful to list separately because they trigger different brainwaves, at least for me.
(Note: JOB_1 and JOB_2 may or may not be the same company)